Abstract
Metabolomics plays a crucial role in understanding metabolic processes within biological systems. Using specific pulse sequences, NMR-based metabolomics detects small and macromolecular metabolites that are altered in blood samples. Here we proposed a method called spectral editing neural network, which can effectively edit and separate the spectral signals of small and macromolecules in 1H NMR spectra of serum and plasma based on the linewidth of the peaks. We applied the model to process the 1H NMR spectra of plasma and serum. The extracted small and macromolecular spectra were then compared with experimentally obtained relaxation-edited and diffusion-edited spectra. Correlation analysis demonstrated the quantitative capability of the model in the extracted small molecule signals from 1H NMR spectra. The principal component analysis showed that the spectra extracted by the model and those obtained by NMR spectral editing methods reveal similar group information, demonstrating the effectiveness of the model in signal extraction.
Subject terms: Metabolomics, Metabolomics, Solution-state NMR, Cheminformatics
1H NMR-based metabolomics can detect small and macromolecular metabolites simultaneously from complex biological samples, however, signaling overlap remains a challenge for accurate molecular identification and quantification. Here, the authors develop a spectral editing neural network to effectively edit and separate the spectral signals of small and macromolecules in the 1H NMR spectra of serum and plasma based on the linewidth of the peaks.
Introduction
Metabolomics is the systematic analysis of metabolites, which are the end products of the metabolic processes within a biological system1. Its primary goal is to understand the overall metabolic processes and pathways in biological systems2. One of the main analytical techniques used in metabolomics is proton nuclear magnetic resonance (1H NMR) spectroscopy, which is a powerful tool for investigating metabolic profiles3,4. It offers several advantages, including quantitative capabilities, high repeatability, minimal sample preparation requirements, non-destructiveness, and the ability to identify chemical compounds through structural analysis5,6. Blood, as the primary body fluid associated with systemic metabolism, is the ideal choice for investigating various physiological states in NMR-based metabolomics7–12. However, blood samples containing albumin, lipids, lipoprotein subclasses, and other macromolecules produce broad NMR signals that overlap with and complicate those from small molecules, hampering molecular identification and quantification13,14.
To overcome this challenge, sample preparation methods such as ultrafiltration15 or precipitation16,17 have been used. Nevertheless, these processes are not only extremely time-consuming but also result in the loss of information from macromolecules5,6. An alternative approach involves using NMR experiments to selectively edit signals of different molecules based on their diffusional and relaxation properties18. The one-dimensional (1D) diffusion-edited experiment is used for studying lipoproteins in blood3,6,14,18, while the 1D relaxation-edited experiment is employed for analyzing small molecular metabolites6,17. Nevertheless, it is important to note that the levels of macromolecules in blood samples are linked to the risk of various diseases19,20, only by performing a 1D relaxation-edited experiment part of the information regarding macromolecules will be lost.
In the field of metabolomics research on blood samples, NMR experiments for large-scale cohort studies typically involve additional techniques such as 1D Carr–Purcell–Meiboom–Gill (CPMG) experiment (Bruker terminology: cpmgpr1d) to collect spectra of small molecules, and 1D diffusion-edited experiment (Bruker terminology: ledbpgppr2s1d) to collect spectra of macromolecules, in addition to the standard 1D nuclear overhauser enhancement spectroscopy (NOESY)-presat experiment (Bruker terminology: noesypr1d or noesygppr1d)5,6,12. The acquisition time for these NMR experiments on a single sample is typically several minutes, and additional instrument time can be saved if relaxation-edited and diffusion-edited experiments are not performed3,9.
In addition to the conventional relaxation-edited experiment (CPMG-presat) and diffusion-edited experiment, several methods have recently emerged for selectively editing signals of metabolites. In 2017, Rodriguez-Martinez et al. investigated the use of 2D J-resolved (JRES) 1H NMR spectroscopy to generate a virtual proton-decoupled spectrum (pJRES) for large-scale phenotyping studies in blood plasma, aiming to minimize signal overlap in 1D 1H NMR spectra21. While they successfully obtained decoupled signals of small molecules, the signals from macromolecules were lost. In 2020, Takis et al. introduced SMolESY (Small Molecule Enhancement SpectroscopY), an efficient spectral processing method designed to suppress macromolecular signals and streamline sampling22. This method effectively enhances resolution and eliminates macromolecular signals directly from the 1D 1H NMR spectrum without intensity modulation. Through mathematical differentiation of standard 1H NMR spectra, SMolESY generates small molecule-enhanced spectra with improved resolution and quantitative capabilities. Despite achieving high-resolution spectra of small molecules, the macromolecular signals were disregarded. Therefore, the development of a method that can simultaneously capture signals from both small and macromolecular metabolites is of great importance for NMR-based metabolomics research in blood.
Deep learning (DL) has made significant progress in several fields, including NMR23. DL models can effectively capture complex relationships between input NMR spectral data and desired outputs through extensive data training. Consequently, they have been extensively investigated for their applications in NMR research24–27.
The aim of this study is to simultaneously obtain spectra for small and macromolecules from the 1H NOESY-presat spectra, which contain information from both small and macromolecules, and are widely used in NMR-based metabolomics study5,6. Here, we introduce the Spectral Editing Neural Network (SENNet) model, which can distinguish signals of small and macromolecules based on the linewidth at half-height of spectral peaks, and generate the spectra of small and macromolecules simultaneous from blood 1H NOESY-presat spectra, to shorten the experimental time of NMR based metabolomics study.
Results and discussion
Dataset
Separating signals in the 1H NOESY-presat spectra of blood samples involves distinguishing overlapping peaks of small molecules, identifying low signal-to-noise spectral peaks that overlap with the broad peak in the low-field region of albumin, and dealing with the discrepancy between the high-intensity peaks of lipid chains and the bimodal peaks of lactate. To address these complex scenarios, this study carefully designed factors such as the dataset, neural network architecture, and loss function.
To effectively train neural network models and recognize underlying data patterns, extensive training with large data sets is crucial. Optimal generalization performance in real-world applications also depends on training and validation datasets that closely resemble the actual task. To achieve this, and given the similar composition of serum and plasma28, we randomly selected a 1H NOESY-presat spectrum of plasma (heparin) as the base data for training. In addition, a set of serum samples was selected as the validation dataset to optimize the hyperparameters of the model and assess its performance. These serum samples were subjected to two different 1H NMR experiments, i.e., NOESY-presat and CPMG-presat.
To generate the training dataset, the first step is to extract peak information from the 1H NOESY-presat spectrum using the peak_widths and find_peaks functions provided by the scipy.signal library29. The peak parameters extracted from the selected plasma NOESY-presat spectrum, including resonance frequency, linewidth, and peak intensity, were then randomly adjusted within a defined range. The threshold for the linewidth at half-height of the peaks distinguishing between small and macromolecules was 3.66 Hz (Supplementary Note 1 and Supplementary Fig. 1). By applying the free induction decay (FID) signal formula, we simulated spectral peaks with various parameter distributions to comprehensively model the signal distribution of potential small- and macromolecules in blood sample 1H NOESY-presat spectra. In addition, random noise was introduced to make simulated spectra more closely resemble real ones. The input features of the training dataset consist of simulated spectra containing signals from both small and macromolecules, while the output labels represent the corresponding simulated spectra of macromolecules. By subtracting the latter from the former, we can obtain the spectrum of small molecules, thus achieving simultaneous acquisition of signals from both small and macromolecules (Supplementary Note 2 and Supplementary Fig. 2).
The validation dataset used authentic 1H NOESY-presat spectra as input features, with output labels derived from small molecule peaks extracted from CPMG-presat spectra using the peak_widths and find_peaks functions from the scipy.signal library29. Reconstruction of the small molecule spectra was achieved by applying the FID signal equation, with a cut-off value of 3.66 Hz (“Theory” section, Supplementary Fig. 1). This dataset was crucial to ensure that the model could accurately handle real NOESY-presat spectra and produce reliable results30.
SENNet model
Using the analogy of image segmentation31, we tackled the problem of discriminating between small and macromolecular signals in 1H NOESY-presat spectra by adapting the Unet architecture, originally designed for medical imaging tasks32. The basic version of Unet, which includes downsampling, upsampling and skip connections, was designed to extract features from two-dimensional images. Our modified architecture, called SENNet, is suitable for 1D NMR spectra, and utilizes peak linewidth information to ensure precise spectral editing, as shown in Fig. 1. The specific parameters of the basic building blocks are given in Supplementary Table 1 and the detailed parameter scales of these building blocks are given in Supplementary Table 2 (Supplementary Note 3).
Fig. 1. The SENNet architecture is a modification of the classical Unet and consists of three elements: downsampling, upsampling, and skip connections.
The input features of the data are intensity-normalized NOESY-presat spectra with a spectral width of 12,000 Hz and 128 k data points, while the output labels represent the spectra of the corresponding macromolecules. Subtracting the latter from the former gives the spectrum of the corresponding small molecule, allowing the spectrum to be edited. The downsampling module (red arrows) consists of two successive 1D convolutional layers and a max pooling layer. These downsampling modules progressively reduce the size of the input spectrum from 128 k, 32 k, 8 k, 2 k, 512 k, 128 k, 32 k, 8 k, 2 k, to 1 k, while simultaneously increasing the number of channels. The upsampling module (dark yellow arrows) consists of two successive 1D convolutional layers and a 1D transposed convolutional layer. The size of the output spectrum is progressively scaled by these upsampling modules, but the number of channels is progressively reduced, in contrast to the downsampling module. The corresponding output layer of the downsampling module is then channel concatenated to the output layer of the upsampling module via skip connections (black arrows). The final layer (light yellow arrow) consists of three successive 1D CNN layers that transform the output labels into a single channel of 128 k data points.
Since the output spectrum of the model represents the signal from the macromolecules, the ideal output should be smooth to minimize sharp spikes relative to the label. Therefore, the loss function used during training focuses on minimizing errors in regions of unstable vibration. The loss function combines total variation error (TVE) and normalized mean squared error (NMSE). Total variation (TV) regularization is commonly used in computer vision tasks to suppress unwanted noise33,34. The training loss function is formulated as follows:
In this formula, “x” is the difference between the output and label spectra, NMSE is the normalized mean square error between the output and label, and “w” is the weight assigned to the TVE term.
In this study, we used the modified TVE as the loss function to train the SENNet model. To optimize the hyperparameters of the model and evaluate its performance, we used the validation dataset consisting of NMR spectra of 113 serum samples obtained from the MetaboLights database35 (accession number MTBLS37411).
To determine the optimal value for parameter “w”, multiple tests were conducted with a fixed number of training iterations, followed by the calculation of the Pearson correlation coefficient between the model’s output small molecule spectra and the small molecule signals in the validation dataset. Consequently, this value for “w” was chosen as the final parameter for the SENNet model. After training, the Pearson correlation coefficient between the small molecule average bin spectra (1.8 Hz) generated by SENNet and the small molecule bin spectra (1.8 Hz) reconstructed based on the averaged spectrum of CPMG-presat was determined to be 92.6% (Supplementary Fig. 3). It was found that a value of 22 for “w” yielded the best performance, i.e., the small molecule spectra generated by the model were closest to the small molecule spectra obtained from the CPMG-presat experiments (Supplementary Note 4 and Supplementary Fig. 4).
It’s worth noting that the reconstructed small molecule spectrum represents the signals of small molecules identifiable in CPMG-presat spectra using the find_peaks and peak_widths functions from the scipy.signal library29. These two functions exclude signals below a certain intensity and do not accurately extract the intensity of overlapping peaks (Supplementary Figs. 3 and 5 and Supplementary Notes 2 and 5). While the signals in the reconstructed small molecule spectrum may differ slightly from the actual small molecule signals, this correlation analysis partially validates the performance of the model and underscores the need for our approach.
In order to systematically assess the generalization ability of SENNet across different spectra and to comprehensively demonstrate the applicability of the model, we applied the trained SENNet model to several datasets (Table 1). These datasets were derived from NMR metabolomics studies performed on 600 MHz and 700 MHz NMR spectrometers and included samples such as plasma (heparin, EDTA) and serum. Firstly, the NOESY-presat spectra were processed using the trained SENNet to generate small and macromolecules spectra respectively. Then, the generated spectra were compared to experimental CPMG-presat and diffusion-edited spectra in terms of peak intensity and principal component analysis (PCA) results.
Table 1.
Sample information for datasets
| Dataset | Sample | Factor | Frequency | Experiments |
|---|---|---|---|---|
| Validation dataset | 113 Serums | Smoking | 600 MHz |
noesygppr1d cpmgpr1d |
| 600-Plasma-heparin dataset | 120 Plasma-heparin | Ibuprofen | 600 MHz |
noesypr1d cpmgpr1d ledbpgppr2s1d |
| 600-Serum dataset | 463 Serums | Bariatric surgery | 600 MHz |
noesygppr1d cpmgpr1d ledbpgppr2s1d |
| 700-Plasma-EDTA data | 1 Plasma-EDTA | – | 700 MHz |
noesygppr1d cpmgpr1d |
| 700-Serum data | 1 Serum | – | 700 MHz |
noesygppr1d cpmgpr1d |
Application of SENNet to plasma samples
To demonstrate the ability of SENNet to process NOESY-presat spectra of plasma samples (heparin), we analyzed the spectra of 120 plasma samples acquired on a 600 MHz NMR spectrometer (600-plasma-heparin dataset). The original study focused on using 1H NMR spectra and PCA analysis to investigate ibuprofen-plasma interactions36. As shown in Fig. 2, SENNet effectively discriminated between signals from small and macromolecules in the NOESY-presat spectra. In the δ 0.7–1.3 region, it accurately discriminated high-intensity signals from lactate and lipid chains, while detecting lower-intensity signals from free amino acids. In the δ 3.0–4.5 region, the model accurately isolated several small molecule metabolites within complex overlapping regions and skillfully managed broad baselines. Furthermore, in the δ 6.5–8.5 region, it efficiently identified 1H signals from aromatic rings and albumin in low signal-to-noise regions of low intensity.
Fig. 2. A 1H NOESY-presat spectrum of a plasma sample acquired on a 600 MHz NMR spectrometer was processed using the SENNET model.
The model effectively separated peaks with larger line widths at half height, as shown by the orange dashed line (macro), allowing the extraction of a small molecule spectrum similar to that obtained from the CPMG-presat experiment (CPMG). A Shows the range from 5.4 ppm to 0.7 ppm, in this range it accurately discriminated between small and macro molecule metabolites in several complex situations. B shows SENNet’s effective isolation of 1H signals from aromatic rings and albumin in low signal-to-noise regions (8.6-5.65 ppm), magnified 18 times for clarity. The blue solid line represents NOESY-presat spectrum (NOESY), the orange dashed line represents macromolecule signals separated by SENNet (macro), the green solid line represents CPMG-presat spectrum (CPMG), the red solid line represents small molecular signals separated by SENNet (small), and the purple solid line represents 1D diffusion-edited spectrum (LEDBP).
To assess the quantitative capability of the SENNet model, we selected ten peaks that were not affected by protein signals. For these selected peaks, we normalized the intensity of these selected peaks from the CPMG-presat spectra and SENNet extracted spectra, using the lactate peak at 4.135 ppm as a reference (Peak 0), to compare their Pearson correlation coefficients and regression coefficients (slopes). The Pearson correlation coefficients and slopes of these ten peaks from different sources are as follows (with the small molecule spectra extracted by the SENNet model on the x-axis): Peak I (δ 3.914): 0.996, 1.54; Peak II (δ 3.857): 0.996, 1.50; Peak III (δ 3.738): 0.996, 1.52; Peak IV (δ 3.268): 0.996, 1.45; Peak V (δ 1.501): 0.993, 1.41; Peak VI (δ 8.478): 0.927, 1.54; Peak VII (δ 7.821): 0.981, 1.56; Peak VIII (δ 7.209): 0.984, 1.47; Peak IX (δ 7.087): 0.994, 1.30; and Peak X (δ 6.915): 0.984, 1.48. Scatter plots of these correlation analyses are shown in Supplementary Fig. 6 (Supplementary Note 6). From these analyses, it was evident that the Pearson correlation coefficients of peak intensities between small molecule spectra and CPMG-presat spectra are close to 1.0, indicating a strong association between them. In addition, in the regression analysis, the slope of the peak intensities was approximately 1.5 when the small molecule signal extracted by the SENNet was positioned along the x-axis, which can be attributed to the 100 ms T2 relaxation effect of the signals in the CPMG-presat experiment and the 100 ms mixing time in the NOESY-presat experiment (noesypr1d), specifically, lactate has a faster T2 relaxation at a total spin echo time of 100 ms compared to these ten peaks.
Figure 2 shows the differences between the CPMG-presat spectrum (CPMG) and the small molecular spectrum extracted by SENNet (small). The CPMG has several broader peaks attributed to incompletely attenuated lipoprotein signals, whereas SENNet excludes signals with larger peak linewidth from the small molecule signals and categorizes them as macromolecular signals (macro). In addition, Fig. 2 shows the results of processing NOESY-presat spectra using the SENNet model to obtain spectra peaks with wider linewidth (macro). In the 5.4–0.7 ppm region of Fig. 2, the extracted macromolecular signals (macro) are similar to the experimental spectra (LEDBP), although not completely identical. This difference can be attributed to the effective removal of small molecular signals, while also attenuating signals from albumin during the diffusion-edited experiment37. Based on the results presented in Fig. 2, we concluded that SENNet can accurately and effectively identify larger linewidth peaks in plasma 1H NOESY-presat spectra with a high degree of similarity to the experimental data.
To further demonstrate the power of SENNet, we performed PCA on the extracted spectra from the 120 plasma NOESY-presat spectra (small and macro) and then performed PCA in the same way on the experimental spectra (CPMG and LEDBP). Thus, each of these four data sets consisted of two sets of 60 samples each, one with and one without ibuprofen. PCA analysis of these datasets revealed that in the CPMG-presat spectra, the cumulative explained variance of the first three principal components (PCs) reached 89.64% (Fig. 3A). Similarly, in the small molecule spectra extracted by SENNet (small), the cumulative explained variance of the first three PCs reached 90.74% (Fig. 3C). In addition, slight differences were observed in the PCA analysis of the diffusion-edited (LEDBP) spectra and the SENNet-extracted macromolecular signals (Macro), as shown in Fig. 3B, D.
Fig. 3. The PCA score plots of experimental spectra (CPMG and LEDBP) and SENNet-extracted spectra (small and macro) for 120 plasma samples divided into two groups of 60 samples each.
These plots are based on CPMG-presat spectra (89.64%) (A), LEDBP spectra (94.04%) (B), small molecule signals (90.74%) (C), and macromolecular signals (94.82%) (D) obtained by processing NOESY-presat spectra. Comparison of the score plots of the SENNet-extracted spectra with the experimental NMR spectra (CPMG and LEDBP) showed similar sample distribution patterns and clusters within the dataset. Red triangle symbols represent samples with ibuprofen, while blue circle symbols represent samples without ibuprofen.
Comparing Fig. 3A (CPMG) with Fig. 3C (small) and Fig. 3B (LEDBP) with Fig. 3D (Macro), we observed that the PCA score plots have similar sample distribution patterns and clustering (Fig. 3). These results suggested that SENNet’s separation of small and macromolecule data achieves similar sample grouping and pattern recognition functions as conventional NMR spectral editing methods, albeit with some nuances. The subtle differences between Fig. 3A (CPMG) and Fig. 3C (small) are due to the retention of some macromolecular signals, such as lipoproteins, in the CPMG spectra. The small difference between Fig. 3B (LEDBP) and Fig. 3D (Macro) is due to the fact that SENNet was able to capture all macromolecular signals, whereas LEDBP lost some signals from albumin. These differences are very well illustrated in Fig. 2.
We have also used small molecule spectra extracted by SENNet to predict drug-plasma interactions, which could be useful in guiding personalized therapy. Because the ibuprofen induced the changed Euclidean distance could defined as the interaction index to measure the strength of the drug-plasma interaction36. In this study, we evaluated the Pearson correlation coefficient between the spectra bin integrals (1.8 Hz) total sum normalized and high-dimensional Euclidean distances for 60 pairs of plasma samples, where one group of samples was supplemented with ibuprofen and the other was not. This coefficient measures the relationship between the Euclidean distances of the paired samples and the signal intensities in the absence of ibuprofen, thus capturing the effects induced by ibuprofen36,38.
Figure 4 displayed a heat map of the Pearson correlation coefficients of the Euclidean distances and the spectra bin integrals obtained from the CPMG-presat experiments and SENNet, respectively, where the color scale represents the absolute value of the Pearson correlation coefficients between signal bin integrals and changed Euclidean distance. Higher correlation coefficients (red) indicated that the metabolite contributes more to the classification in the multivariate analysis, which helps to visually identify significantly correlated peaks. The SENNet-extracted small molecule spectra in Fig. 4 were subjected to the same manipulation as the CPMG spectra. This correlation analysis highlighted the advantages of using SENNet-extracted small molecule spectra in investigating drug-plasma interaction, especially when compared to CPMG-presat spectra containing undegraded macromolecular signals. The results shown in Fig. 4 provide important insights into the identification of potential biomarkers and also highlight the utility of SENNet in small molecule biomarkers discovery.
Fig. 4. Pearson correlation plot between group Euclidean distances in high-dimensional space and the bin integrals of 60 samples.
The plot shows the Pearson correlation coefficients from twodifferent data sets (i.e., CPMG and small), with the absolute values of the coefficients corresponding to the colors of the variables. A higher correlation coefficient indicates a greater contribution of the metabolite to the group classification in the multivariate analysis. The CPMG-presat spectra contain undecayed macromolecular signals as indicated by the arrows.
In metabolomics research, some studies on plasma use EDTA as an anticoagulant12. To further demonstrate SENNet’s ability to process NMR spectra of plasma samples containing EDTA, we analyzed the spectrum of a plasma-EDTA sample collected on a 700 MHz NMR spectrometer (700-plasma-EDTA data). In Supplementary Fig. 7 we can see that the SENNet can also separate the small and macromolecules in NOESY-presat spectra that are acquired at 700 MHz (Supplementary Note 6).
Based on the above findings, SENNet effectively processes 1H NOESY-presat spectra of plasma samples acquired on 600 MHz and 700 MHz NMR spectrometers, accurately extracting signals from both small and macromolecules.
Application of SENNet to serum samples
To demonstrate the ability of SENNet to process serum samples, we selected a dataset collected on a 600 MHz NMR spectrometer (600-serum dataset). This dataset consisted of samples from 106 severely obese patients collected at multiple time points before and after (3 months, 6 months, 9 months, and 12 months) gastric bypass surgery9. As shown in Fig. 5, by applying the SENNet model, we successfully separated small and macromolecular signals in the serum NOESY-presat spectra, which is similar to the processing of plasma NMR spectra. The obtained small and macromolecule profiles showed similar signal intensities and distributions compared to the CPMG-presat and LEDBP profiles. Similarly, incompletely attenuated lipid signals were observed in the CPMG-presat spectra. In contrast, the extracted macromolecular signals contained all albumin signals that were completely attenuated in the LEDBP spectra.
Fig. 5. A 1H NOESY-presat spectrum of a serum sample acquired on a 600 MHz NMR spectrometer was processed using the trained SENNET model.
The model effectively separated peaks with larger linewidth at half-height, as shown by the orange dashed line (macro), allowing for the extraction of small molecule spectrum similar to those obtained from the CPMG-presat experiment (CPMG). A Illustrates the range spanning from 5.4 ppm to 0.7 ppm, while panel B shows SENNet’s effective isolation of 1H signals from aromatic rings and albumin in low signal-to-noise regions (8.6–5.65 ppm), magnified 20 times for clarity. The blue solid line represents NOESY-presat spectral data (NOESY), the orange dashed line represents macromolecule signals separated by SENNet (macro), the green solid line represents cpmgpr1d spectral data (CPMG), the red solid line represents small molecule signals separated by SENNet (small), and the purple solid line represents 1D diffusion-edited spectral data (LEDBP).
To further validate the quantitative capability of SENNet, we selected 10 peaks unaffected by protein signals. For these selected peaks, we normalized the intensity of these selected peaks from the CPMG-presat spectra and SENNet extracted spectra, using the lactate peak at 4.135 ppm as a reference (Peak 0), to compare their Pearson correlation coefficients and regression coefficients (slopes). We normalized the intensity of the small molecule spectra extracted from CPMG-presat spectra and SENNet using the lactate peak at 4.135 ppm as a reference (Peak 0), and calculated their intensity Pearson correlation coefficients and slopes between CPMG-presat spectra and SENNet-extracted small molecule spectra. The results are as follows: Peak I (δ 3.914): 0.992, 1.07; Peak II (δ 3.859): 0.994, 0.99; Peak III (δ 3.757): 0.998, 0.95; Peak IV (δ 3.271): 0.994, 1.03; Peak V (δ 1.502): 0.981, 0.89; Peak VI (δ 8.481): 0.984, 1.06; Peak VII (δ 7.452): 0.984, 0.98; Peak VIII (δ 7.218): 0.983, 0.98; Peak IX (δ 7.008): 0.953, 0.91; and Peak X (δ 6.922): 0.992, 1.09. Scatter plots of these correlation analyses are shown in Supplementary Fig. 8 (Supplementary Note 7). From these analyses, it was evident that the Pearson correlation coefficients of peak intensities between small molecule spectra and CPMG-presat spectra are close to 1.0, indicating a strong association between them. In addition, in the regression analysis, the slope of the peak intensities was approximately 1.0 when the small molecule signal extracted by the SENNet was positioned along the x-axis, which can be attributed to the 76.8 ms T2 relaxation effect of the signals in the CPMG-presat experiment and the 10 ms mixing time in the NOESY-presat experiment (noesygppr1d), indicating similar relaxation behavior to lactate in this CPMG-presat experiment.
The SENNet model was then applied to extract small and macromolecular spectra from all the 1H NOESY-presat spectra. The extracted spectra (small and macro) were then subjected to PCA analysis separately from the spectra obtained from the NMR experiments (CPMG and LEDBP) to demonstrate the efficiency of the SENNet method. At this point, each sub-dataset contained samples taken before and after surgery (3 months, 6 months, 9 months, and 12 months) with sample sizes of 105, 97, 98, 92, and 71, respectively, for a total of 463 samples. PCA analysis of the CPMG-presat spectra showed a cumulative explained variance of 81.66%. In contrast, the cumulative explained variance of the first three PCs of the small molecule spectra extracted by SENNet was 88.30% (Fig. 6A). In addition, PCA analysis of the macromolecular signals extracted from the SENNet and LEDBP spectra showed that the cumulative explained variance of the first three PCs was 92.85% and 93.46%, respectively (Supplementary Fig. 9). In addition, the score plots show a clear grouping pattern and provide detailed insight into the five-time points (Fig. 6 and Supplementary Fig. 9).
Fig. 6. Comparison of the PCA score plots generated from CPMG-presat experimental spectra and processed small molecular spectra for 463 serum samples.
The datasets consist of samples taken before and after surgery (3 months, 6 months, 9 months, and 12 months), with sample sizes of 105, 97, 98, 92, and 71, totaling 463 samples. The PCA score plots demonstrate similar distribution patterns, capturing inter-group and intra-group separations. A Shows the PCA scores plots (PC1 vs PC2) based on CPMG-presat spectra of blood serum samples. Paired PCA analysis was conducted on 106 severely obese patients’ samples using small molecular spectral datasets processed by SENNet (B). The plots exhibit consistent distribution patterns, with the horizontal line marker, left triangle marker, vertical line marker, cross marker, and plus marker representing samples collected before and after surgery at 3 months, 6 months, 9 months, and 12 months, respectively.
Figure 6 and Supplementary Fig. 9 compared the PCA score plots of the spectra obtained by the SENNet model and the NMR experiments (CPMG and LEDBP). For both small and macromolecular information, both methods (NMR experiment and SENNet) reveal similar sample distribution patterns and effectively capturing both inter- and intra-group separations, demonstrating the reliability of the SENNet model in capturing spectrally information from the NMR spectral signal (NOESY-presat). Similar to the case of plasma samples, the minor differences between the two methods are mainly due to the fact that the CPMG-presat spectra contain signals from macromolecules, while the LEDBP spectra lose signals from albumin, whereas SENNet is able to completely separate all signals from both small and macromolecules. Notably, while the CPMG-presat and LEDBP experiments each took approximately 8 min for a sample, SENNet was able to rapidly generate signals from both small and macromolecules less than 1.0 s, which could be processed on a personal computer.
To further demonstrate the generalizability of SENNet, we processed a serum NMR spectrum collected on a 700 MHz NMR spectrometer (700-serum data). In Supplementary Fig. 10 we can see that the SENNet can also separate the small and macromolecular signal in 700 MHz NOESY-presat spectra (Supplementary Note 7). In summary, SENNet effectively distinguishes signals from both small and macromolecules in NOESY-presat spectra obtained from serum samples at both 600 MHz and 700 MHz.
Discussion
In 1H CPMG-presat experiment, the spin echo time is typically set to an intermediate value to preserve small molecule signals while attenuating macromolecule signals, thus preserving partial lipoprotein signals in the spectrum39. However, in diffusion-edited experiments, the signal from albumin is absent due to signal attenuation3. The small molecule spectrum generated by the SENNet model does not include signals from macromolecules, whereas the macromolecule spectrum generated by SENNet includes all signals from macromolecules, which differs slightly from that obtained from the NMR spectral editing methods (CPMG and LEDBP). The main difference between macromolecules and small molecules is their peak linewidth, which is sharper for small molecules. It is possible to recognize these features and distinguish between the two types of peaks using SENNet.
In the PCA score plots shown in Figs. 3, 6, and Supplementary Fig. 9, the spectra from different sources (i.e., extracted by SENNET and NMR spectral editing methods) show similar sample distributions, suggesting that our proposed SENNET method is effective in extracting macro/small molecule signals. For small molecule signals from different sources, the difference between their PCA score plots is due to the fact that some macro molecular signals, such as lipoproteins, are retained in the CPMG-presat spectra, whereas the small molecular signals extracted by SENNet exclude signals with larger linewidths. The difference between the PCA score plots of macromolecular signals from different sources is due to the fact that SENNet is able to capture all macromolecular signals, whereas LEDBP loses some signals from albumin. These differences are well illustrated in Figs. 2 and 5.
With regard to the quality of the spectra, the key experimental parameters for the CPMG-presat experiment are the spin echo time, the relaxation delay and the acquisition time, while for the NOESY-presat experiment, the key experimental parameters are the mixing time, the relaxation delay and the acquisition time. These parameters affect the strength of the spectral peaks, which in turn affects the slope of the small molecule signals from different sources (i.e., CPMG-presat and extracted by SENNet). According to the regression analysis, a short mixing time of 10 ms when using the noesygppr1d pulse sequence is really beneficial to suppress the relaxation effects for quantification6.
The above results demonstrated the effectiveness of the SENNet model in processing 1H NOESY-presat spectra of plasma and serum samples obtained from 600 MHz and 700 MHz spectrometers, which accurately extracted the signals of both small and macromolecules. Subsequent PCA analyses of each of the four sub-datasets (i.e., small, macro, CPMG, and LEDBP) showed that the spectra generated by the SENNet model were similar to the results of the corresponding NMR spectral editing methods (e.g., CPMG and LEDBP) in terms of inter- and intra-group distributions, which means SENNet may replace CPMG-presat and diffusion-edited experiments in NMR metabolomics research, significantly saving experimental time, at least for 600/700 MHz serum/plasma 1H spectra.
In addition to serum and plasma samples, NMR metabolomics has been performed on other biological fluids such as urine40, saliva41,42, cerebrospinal fluid (CSF)43,44, and exhaled breath condensate (EBC)45. For blood samples, in addition to the NOESY-presat experiment, CPMG-presat and diffusion-edited experiments may also be performed. However, for other types of samples such as urine40, saliva41,42, CSF43,44, and EBC45, usually, only the NOESY-presat experiment is sufficient, due to their unique compositions and the pre-analytical procedures involved in NMR sample preparation5,6. When needed, the application of this method can also be extended to signal separation tasks in other NMR samples, as long as the appropriate training data is provided for model training.
To achieve accurate separation of small and macromolecular signals in 1H NOESY-presat spectra of blood samples, the architecture of the neural network, the loss function, and the prior knowledge of the attribution of blood sample NMR signals are crucial. It should be noted that, in the peak picking step of this study, the value of 3.66 Hz is used as the linewidth threshold, which is applicable to the NMR spectra of blood samples. The above also indicates that for samples lacking prior knowledge, this method may exhibit some errors. Fortunately, common sample types for NMR-based metabolomics research have already been extensively studied by NMR community.
Therefore, the limitation of this signal separation method is that it relies on a data-driven DL model and is therefore well suited to the samples with prior knowledge. It cannot be directly applied to unknown samples. In contrast, the traditional NMR-based spectral editing methods, such as relaxation-edited (CPMG), diffusion-edited (LEDBP), and diffusion and relaxation edited of proton NMR spectra, have a wider applicability.
Conclusion
The SENNet model used DL techniques to process 1H NOESY-presat spectra as digital signals. Through extensive training and validation with synthetically generated data, it effectively discriminated between small and macromolecular signals in 1H NOESY-presat spectra of plasma and serum samples based on peak linewidths. The model retained quantitative information while extracting small molecular peak intensities highly correlated with those from CPMG-presat experiments, and excluding all undecayed macromolecular signals. SENNet’s handling of 1H NOESY-presat spectra provides a functional alternative to conventional NMR spectral editing methods (e.g., CPMG-presat and LEDBP). Application of the SENNet model to multiple NMR metabolomics datasets allows PCA analysis of the extracted small and macromolecular signals, revealing similar distribution patterns to those obtained from CPMG-presat and LEDBP experiments. The SENNet model provides a solution for obtaining small and macromolecular signals from plasma and serum samples, and is particularly suitable for 1H NOESY-presat spectra obtained on 600 MHz and 700 MHz NMR spectrometers. Particularly useful for large-scale cohort studies, this data post-processing approach significantly reduces sampling time compared to conventional NMR metabolomics workflows, thereby increasing the efficiency of NMR metabolomics studies on plasma and serum samples.
Methods
NMR datasets
In this study, we used several NMR metabolomics datasets focusing on human blood samples (serum and plasma). These datasets were acquired using either 600 MHz or 700 MHz NMR spectrometers, and the corresponding studies have been published previously9–11,36. To illustrate the robustness of the SENNet model, we specifically selected three datasets from the MetaboLights database35.
Validation dataset
The original study aimed to investigate smoking-induced metabolic changes by comparing the serum profiles of smokers and non-smokers using fingerprint data11. NMR experiments, including noesygppr1d and cpmgpr1d, were performed using a Bruker 600 MHz spectrometer. A total of 113 serum samples were processed in our investigation. The pre-processed 1H NMR spectra were retrieved from MetaboLights35 under accession number MTBLS374.
600-plasma-heparin dataset
The original study focused on evaluating the metabolic variability of human plasma and its relevance to ibuprofen-plasma interactions and personalized treatment using 1H NMR spectroscopy and PCA analysis36. The dataset consisted of 60 pairs of human plasma (Li-heparin) samples, one with and one without ibuprofen, randomly selected from our previous datasets. NMR experiments, including noesypr1d, cpmgpr1d, and ledbpgppr2s1d, were performed using a Bruker 600 MHz spectrometer.
600-serum dataset
The original study investigated the metabolic changes after bariatric surgery at different time intervals9. Serum samples were collected from 106 severely obese patients before surgery and at 3 months, 6 months, 9 months, and 12 months after surgery. Our study included 463 serum samples according to the information available in the repository. NMR experiments, including noesygppr1d, cpmgpr1d, and ledbpgppr2s1d, were performed using a Bruker 600 MHz spectrometer. The pre-processed 1H NMR spectra were obtained from MetaboLights35 under accession number MTBLS242.
700-plasma-EDTA data and 700-serum data
The original study investigated the influence of premature adrenarche (PA) on the metabolic profiles of children with idiopathic PA10. Plasma-EDTA and serum samples were collected from 52 children with idiopathic PA and 48 age-matched controls. NMR experiments, including noesygppr1d and cpmgpr1d, were performed on both plasma-EDTA and serum samples using a Bruker 700 MHz NMR spectrometer. The pre-processed 1H NMR spectra were obtained from MetaboLights35 under accession number MTBLS2387.
Details of sample collection and preparation, NMR spectral acquisition, and main results are provided in the original manuscripts. Details of NMR spectral acquisition, processing, and subsequent post-processing using SENNet are provided in Supplementary Note 8. Simplified sample information for these datasets is given in Table 1, while the NMR experimental parameters of these datasets used in the study are given in Supplementary Table 3. Ethical considerations related to the studies involving the datasets used are fully described in the respective original articles.
Theory
Small and macromolecular signals in NMR spectra
In high-resolution 1H NMR spectra, there are distinct differences in the signal peaks originating from small and macromolecules. Macromolecular signals typically exhibit broad lines, whereas signals from small molecules appear as sharp lines36. This difference is attributed to faster transverse relaxation for macromolecules and slower transverse relaxation for small molecules46,47.
Researchers have extensively studied blood samples using NMR spectroscopy, and all the signals in the 1H NMR spectra have been accurately assigned46. This prior knowledge could be used to determine the half-height linewidth threshold for distinguishing between small and macromolecular peaks. The procedure for determining this threshold is as follows: first, the peak_widths and find_peaks functions from the scipy.signal library29 was used to extract the linewidth information of all identifiable peaks from relaxation-edited (CPMG-presat) and diffusion-edited 1H NMR spectra of multiple samples. Then, statistical comparisons of peak linewidth in relaxation-edited and diffusion-edited spectra, combined with prior knowledge of NMR signal assignments, showed that a cutoff value of 3.66 Hz could roughly distinguish between small and macromolecular signals (Supplementary Note 1 and Supplementary Fig. 1).
Generation of simulated spectral peaks
In 1D high-resolution 1H NMR spectra, peaks exhibit Lorentzian line shapes and are characterized by parameters such as resonance frequency, linewidth at half-height, and peak intensity, providing detailed information about specific peaks. Based on the theory of NMR signal generation, the 1D FID signal can be generated using the formula FID = A × exp(2πift) × exp(−t/T2), where A, π, i, f, t, and T2 represent peak intensity, π, imaginary unit, resonance frequency, time domain, and transverse relaxation time, respectively. High-resolution NMR spectra (e.g., NOESY-presat) consist essentially of these basic peak patterns. For the 1H NMR spectrum of a given system, such as a blood sample, these parameters are distributed over a range, including different resonance frequencies, peak intensities, and line widths. It is this characteristic that allows accurate and effective differentiation of signals from small molecules and macromolecules, as smaller peak line widths can be identified based on their unique characteristics.
Generation of the training dataset
To effectively distinguish between small and macromolecular signals, transfer learning can be used48. This technique involves adapting pre-trained models to new data sets. During transfer learning, data is divided into two categories: source data and target data. Source data refers to additional data that is directly relevant to the task at hand, while target data is directly related to the task. In typical transfer learning scenarios, source data is often large, while the target data is relatively small. In this study, the source data was labeled while the target data was unlabeled, highlighting the importance of ensuring similarity between the source and target data. To achieve this, we randomly selected a real plasma NOESY-presat spectrum as the base data for the training data set, as plasma exhibits similarities to serum.
In order to improve the model’s ability to generalize, our training data set required the inclusion of a variety of instances with peak overlap. To achieve this, we used the FID formula to simulate peak signals characterized by different parameters, which were then aggregated. Fast Fourier Transform was used to derive the comprehensive simulated spectrum. This spectrum showed a broad distribution of resonance frequencies, coupled with variations in peak intensity and linewidth, encompassing both simulated small and macromolecular signals. To increase the reality of the spectrum, random noise was introduced. This noise not only mimicked real-world conditions but also added complexity to the dataset, increasing the robustness and generalization capabilities of the DL model trained on such data.
To further enhance the generalization capability of the DL model for blood-related samples, we used the find_peaks and find_peaks functions from the scipy.signal library to extract peak information from plasma 1H NOESY-presat spectra29. During this process, we generated lists of information for both small and macromolecules, and then used these peak parameters to create a reference spectrum. We then randomly selected peaks and varied their parameters (e.g., resonance frequency, linewidth, and intensity) within a certain range to generate randomized variation spectra and their corresponding macromolecular spectra. This approach not only generates a large number of accessible spectra but also increases the complexity of the generated data to improve the generalization capability of the model. In addition, by randomly generating peaks representing small and macromolecules within specified ranges of intensity, resonance frequency, and linewidth, we further increased the diversity of the dataset by simulating additional NMR signals in the spectra, thus further increasing the variability of the dataset. When training the neural network model, we use simulated spectra containing both small molecules and macromolecules as input data, while the macromolecule spectra are used as output labels. The difference between input and output represents the small molecule spectra (Supplementary Note 2 and Supplementary Fig. 2).
Generation of the validation dataset
In this study, our training dataset consisted of labeled data, while the target dataset was unlabeled, and the ultimate goal is to ensure the model’s ability to generalize to the target data. Therefore, in addition to increasing the similarity between the training and target datasets, we created a validation set to further ensure the performance of the model. The validation set used real 1D NOESY-presat spectra as input and small molecule spectra extracted from 1D CPMG-presat spectra as output labels (Supplementary Note 2 and Supplementary Fig. 3). Considering that real CPMG-presat spectra contain incompletely attenuated signals from macromolecules, we extracted spectral peak information from real CPMG-presat spectra and generated reconstructed small molecular spectra (Supplementary Note 5 and Supplementary Fig. 5). Thus, the reconstructed small molecule signal spectra exclude signals from macromolecules. Since the half-height linewidth of peaks in NMR spectra is positively correlated with molecular weight, we used this information for spectral editing. To generate the training and validation datasets, we set the half-height linewidth cutoff to 3.66 Hz.
In addition to accurately identifying the peak linewidth, it is also important to consider the overlapping characteristics of peaks across the spectrum. Therefore, this study developed a neural network architecture and loss function specifically tailored for this purpose.
Digitalization of NMR spectra
In NMR metabolomics research, a 600 MHz NMR spectrometer is usually the most cost-effective choice5,6, and indeed the majority of studies are performed at this frequency12. For metabolomics analysis of serum and plasma samples, standard procedures include sample preparation and NMR spectrum acquisition. During NMR acquisition, the spectral width is typically set to 30 ppm or 20 ppm5,6. Data processing begins with zero filling to obtain 128 k data points, followed by Fourier transformation. For a spectral width of 12,000 Hz (20 ppm at 600 MHz) the final digital resolution is <0.1 Hz per point. When generating training and validation data, with a spectral width of approximately 12,000 Hz and 128 k data points, the cut-off value for the half-height linewidth is set at 40 points, which is approximately 3.66 Hz. For NMR spectra acquired on non-600 MHz spectrometers, or if the spectral width is not 12,000 Hz, or if the zero fill points are not set to 128 k, digital conversion should be performed. Ensure that the spectral width of the spectrum is approximately 12,000 Hz and that it contains 128 k data points, at which point the digital resolution is approximately 0.0916 Hz per point.
Supplementary information
Description of Additional Supplementary Files
Acknowledgements
This work was supported by the National Key Research and Development Program of China (2018YFA0704002, 2018YFE0202300), the National Natural Science Foundation of China (21991081, 21921004, 21974149), and the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB0540301).
Author contributions
X.X., B.J., and M .L. conceived the study and designed the experiments, X.X. designed and trained the neural network and analyzed the data. Q.W. contributed to data analysis and discussion. X.C. contributed to the NMR data processing. X.Z. contributed to data analysis and discussion. B.J. and M.L. provided overall supervision of the project and critical suggestions. X.X. and B.J. wrote the manuscript with contributions from all authors.
Peer review
Peer review information
Communications Chemistry thanks Sofia Moco and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Data availability
The data supporting the results of this study are available on reasonable request from the corresponding author. Or from the MetaboLights database under accession numbers MTBLS374, MTBLS242, MTBLS2387. Source data for Figs. 2–6 are provided in Supplementary Data 1.
Code availability
To validate and reproduce the results and apply the model in future studies, the profiling results, example data, data analysis workflows, and examples of model use are available at https://github.com/wipm-edasp/SENNet. The model can run on workstations equipped with GPUs or on the CPU of personal computers. You can use Anaconda to create a conda environment and run it on a Python-based Jupyter notebook. Operating environment: Python 3.8.13 or higher, PyTorch 1.7.1 or higher, and NumPy 1.22.3 or higher.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Bin Jiang, Email: jbin@apm.ac.cn.
Maili Liu, Email: ml.liu@apm.ac.cn.
Supplementary information
The online version contains supplementary material available at 10.1038/s42004-024-01251-x.
References
- 1.Nicholson, J. K., Lindon, J. C. & Holmes, E. ‘Metabonomics’: understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. Xenobiotica29, 1181–1189 (1999). 10.1080/004982599238047 [DOI] [PubMed] [Google Scholar]
- 2.Fiehn, O. Metabolomics—the link between genotypes and phenotypes. Plant Mol. Biol.48, 155–171 (2002). 10.1023/A:1013713905833 [DOI] [PubMed] [Google Scholar]
- 3.Bliziotis, N. G. et al. A comparison of high-throughput plasma NMR protocols for comparative untargeted metabolomics. Metabolomics16, 64 (2020). 10.1007/s11306-020-01686-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Nicholson, J. K. et al. Metabolic phenotyping in clinical and surgical environments. Nature491, 384–392 (2012). 10.1038/nature11708 [DOI] [PubMed] [Google Scholar]
- 5.Vignoli, A. et al. High-Throughput Metabolomics by 1D NMR. Angew. Chem. Int. Ed.58, 968–994 (2019). 10.1002/anie.201804736 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Beckonert, O. et al. Metabolic profiling, metabolomic and metabonomic procedures for NMR spectroscopy of urine, plasma, serum and tissue extracts. Nat. Protoc.2, 2692–2703 (2007). 10.1038/nprot.2007.376 [DOI] [PubMed] [Google Scholar]
- 7.Pugh, J. N. et al. Four weeks of probiotic supplementation alters the metabolic perturbations induced by marathon running: insight from metabolomics. Metabolites11, 14 (2021). 10.3390/metabo11080535 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Loo, R. L. et al. Quantitative in-vitro diagnostic NMR spectroscopy for lipoprotein and metabolite measurements in plasma and serum: recommendations for analytical artifact minimization with special reference to COVID-19/SARS-CoV-2 samples. J. Proteome Res.19, 4428–4441 (2020). 10.1021/acs.jproteome.0c00537 [DOI] [PubMed] [Google Scholar]
- 9.Gralka, E. et al. Metabolomic fingerprint of severe obesity is dynamically affected by bariatric surgery in a procedure-dependent manner. Am. J. Clin. Nutr.102, 1313–1322 (2015). 10.3945/ajcn.115.110536 [DOI] [PubMed] [Google Scholar]
- 10.Matzarapi, K. et al. NMR-based metabolic profiling of children with premature adrenarche. Metabolomics18, 11 (2022). 10.1007/s11306-022-01941-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Kaluarachchi, M. R., Boulangé, C. L., Garcia-Perez, I., Lindon, J. C. & Minet, E. F. Multiplatform serum metabolic phenotyping combined with pathway mapping to identify biochemical differences in smokers. Bioanalysis8, 2023–2043 (2016). 10.4155/bio-2016-0108 [DOI] [PubMed] [Google Scholar]
- 12.Huang, K., Thomas, N., Gooley, P. R. & Armstrong, C. W. Systematic review of NMR-based metabolomics practices in human disease research. Metabolites12, 25 (2022). 10.3390/metabo13010025 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Soininen, P. et al. High-throughput serum NMR metabonomics for cost-effective holistic studies on systemic metabolism. Analyst134, 1781–1785 (2009). 10.1039/b910205a [DOI] [PubMed] [Google Scholar]
- 14.Lodge, S. et al. Diffusion and relaxation edited proton NMR spectroscopy of plasma reveals a high-fidelity supramolecular biomarker signature of SARS-CoV-2 infection. Anal. Chem.93, 3976–3986 (2021). 10.1021/acs.analchem.0c04952 [DOI] [PubMed] [Google Scholar]
- 15.Mal, T. K., Tian, Y. & Patterson, A. D. Sample preparation and data analysis for NMR-based metabolomics. Methods Mol. Biol.2194, 301–313 (2021). 10.1007/978-1-0716-0849-4_16 [DOI] [PubMed] [Google Scholar]
- 16.Nagana Gowda, G. A., Gowda, Y. N. & Raftery, D. Expanding the limits of human blood metabolite quantitation using NMR spectroscopy. Anal. Chem.87, 706–715 (2015). 10.1021/ac503651e [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Gowda, G. A. N. & Raftery, D. Quantitating metabolites in protein precipitated serum using NMR spectroscopy. Anal. Chem.86, 5433–5440 (2014). 10.1021/ac5005103 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Liu, M. L., Nicholson, J. K. & London, J. C. High-resolution diffusion and relaxation edited one- and two-dimensional H-1 NMR spectroscopy of biological fluids. Anal. Chem.68, 3370–3376 (1996). 10.1021/ac960426p [DOI] [PubMed] [Google Scholar]
- 19.Wishart, D. S. et al. NMR and metabolomics a roadmap for the future. Metabolites12, 678 (2022). 10.3390/metabo12080678 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Hegele, R. A. Plasma lipoproteins: genetic influences and clinical implications. Nat. Rev. Genet.10, 109–121 (2009). 10.1038/nrg2481 [DOI] [PubMed] [Google Scholar]
- 21.Rodriguez-Martinez, A. et al. J-resolved 1H NMR 1D-projections for large-scale metabolic phenotyping studies: application to blood plasma analysis. Anal. Chem.89, 11405–11412 (2017). 10.1021/acs.analchem.7b02374 [DOI] [PubMed] [Google Scholar]
- 22.Takis, P. G., Jiménez, B., Sands, C. J., Chekmeneva, E. & Lewis, M. R. SMolESY: an efficient and quantitative alternative to on-instrument macromolecular 1H-NMR signal suppression. Chem. Sci.11, 6000–6011 (2020). 10.1039/D0SC01421D [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Chen, D., Wang, Z., Guo, D., Orekhov, V. & Qu, X. Review and prospect: deep learning in nuclear magnetic resonance spectroscopy. Chemistry26, 10391–10401 (2020). 10.1002/chem.202000246 [DOI] [PubMed] [Google Scholar]
- 24.Klukowski, P. et al. NMRNet: a deep learning approach to automated peak picking of protein NMR spectra. Bioinformatics34, 2590–2597 (2018). 10.1093/bioinformatics/bty134 [DOI] [PubMed] [Google Scholar]
- 25.Hansen, D. F. Using deep neural networks to reconstruct non-uniformly sampled NMR spectra. J. Biomol. NMR73, 577–585 (2019). 10.1007/s10858-019-00265-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Li, D. W., Hansen, A. L., Yuan, C. H., Bruschweiler-Li, L. & Bruschweiler, R. DEEP picker is a deep neural network for accurate deconvolution of complex two-dimensional NMR spectra. Nat. Commun.12, 13 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Xiao, X., Wang, Q., Zhang, X., Jiang, B. & Liu, M. Restore high-resolution nuclear magnetic resonance spectra from inhomogeneous magnetic fields using a neural network. Anal. Chem.95, 16567–16574 (2023). 10.1021/acs.analchem.3c02688 [DOI] [PubMed] [Google Scholar]
- 28.Kaluarachchi, M. et al. A comparison of human serum and plasma metabolites using untargeted 1H NMR spectroscopy and UPLC-MS. Metabolomics14, 32 (2018). 10.1007/s11306-018-1332-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods17, 261–272 (2020). 10.1038/s41592-019-0686-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Chatterjee, S. & Zielinski, P. On the generalization mystery in deep learning. Preprint at arXiv10.48550/arXiv.2203.10036 (2022).
- 31.Ghosh, S., Das, N., Das, I. & Maulik, U. Understanding deep learning techniques for image segmentation. ACM Comput. Surv.52, 1–35 (2019). 10.1145/3329784 [DOI] [Google Scholar]
- 32.Ronneberger, O., Fischer, P. & Brox, T. in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. (eds. N. Navab, J. Hornegger, W.M. Wells & A.F. Frangi) 234–241 (Springer International Publishing, Cham; 2015).
- 33.Dey, N. et al. Richardson–Lucy algorithm with total variation regularization for 3D confocal microscope deconvolution. Microsc. Res. Tech.69, 260–266 (2006). 10.1002/jemt.20294 [DOI] [PubMed] [Google Scholar]
- 34.Liu, J., Sun, Y., Xu, X. & Kamilov, U. S. in ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (ICASSP, 2019).
- 35.Haug, K. et al. MetaboLights: a resource evolving in response to the needs of its scientific community. Nucleic Acids Res.48, D440–D444 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Du, Y. et al. NMR spectroscopic approach reveals metabolic diversity of human blood plasma associated with protein–drug interaction. Anal. Chem.85, 8601–8608 (2013). 10.1021/ac401738z [DOI] [PubMed] [Google Scholar]
- 37.Liu, M., Tang, H., Nicholson, J. K. & Lindon, J. C. Use of 1H NMR-determined diffusion coefficients to characterize lipoprotein fractions in human blood plasma. Magn. Reson. Chem.40, S83–S88 (2002). 10.1002/mrc.1121 [DOI] [Google Scholar]
- 38.Chai, X. et al. Combination of peak-picking and binning for NMR-based untargeted metabonomics study. J. Magn. Reson.351, 107429 (2023). 10.1016/j.jmr.2023.107429 [DOI] [PubMed] [Google Scholar]
- 39.Mumcu, A. A different approach to the quantification of human seminal plasma metabolites using high-resolution NMR spectroscopy. J. Pharm. Biomed. Anal.229, 115356 (2023). 10.1016/j.jpba.2023.115356 [DOI] [PubMed] [Google Scholar]
- 40.Dona, A. C. et al. Precision high-throughput proton NMR spectroscopy of human urine, serum, and plasma for large-scale metabolic phenotyping. Anal. Chem.86, 9887–9894 (2014). 10.1021/ac5025039 [DOI] [PubMed] [Google Scholar]
- 41.Dame, Z. T. et al. The human saliva metabolome. Metabolomics11, 1864–1883 (2015). 10.1007/s11306-015-0840-5 [DOI] [Google Scholar]
- 42.Quartieri, E. et al. Sample optimization for saliva 1H-NMR metabolic profiling. Anal. Biochem.640, 114412 (2022). 10.1016/j.ab.2021.114412 [DOI] [PubMed] [Google Scholar]
- 43.Valsecchi, V. et al. SMN deficiency perturbs monoamine neurotransmitter metabolism in spinal muscular atrophy. Commun. Biol.6, 1155 (2023). 10.1038/s42003-023-05543-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Maillet, S. et al. Experimental protocol for clinical analysis of cerebrospinal fluid by high resolution proton magnetic resonance spectroscopy. Brain Res. Protoc.3, 123–134 (1998). 10.1016/S1385-299X(98)00033-6 [DOI] [PubMed] [Google Scholar]
- 45.Ghosh, N. et al. Global metabolome profiling of exhaled breath condensates in male smokers with asthma COPD overlap and prediction of the disease. Sci. Rep.11, 16664 (2021). 10.1038/s41598-021-96128-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Jiménez, B. et al. Quantitative lipoprotein subclass and low molecular weight metabolite analysis in human serum and plasma by 1H NMR spectroscopy in a multilaboratory trial. Anal. Chem.90, 11962–11971 (2018). 10.1021/acs.analchem.8b02412 [DOI] [PubMed] [Google Scholar]
- 47.Masuda, R. et al. Plasma lipoprotein subclass variation in middle-aged and older adults: sex-stratified distributions and associations with health status and cardiometabolic risk factors. J. Clin. Lipidol.17, 677–687 (2023). 10.1016/j.jacl.2023.06.004 [DOI] [PubMed] [Google Scholar]
- 48.Pan, S. J. & Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng.22, 1345–1359 (2010). 10.1109/TKDE.2009.191 [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Description of Additional Supplementary Files
Data Availability Statement
The data supporting the results of this study are available on reasonable request from the corresponding author. Or from the MetaboLights database under accession numbers MTBLS374, MTBLS242, MTBLS2387. Source data for Figs. 2–6 are provided in Supplementary Data 1.
To validate and reproduce the results and apply the model in future studies, the profiling results, example data, data analysis workflows, and examples of model use are available at https://github.com/wipm-edasp/SENNet. The model can run on workstations equipped with GPUs or on the CPU of personal computers. You can use Anaconda to create a conda environment and run it on a Python-based Jupyter notebook. Operating environment: Python 3.8.13 or higher, PyTorch 1.7.1 or higher, and NumPy 1.22.3 or higher.






