Skip to main content
PLOS One logoLink to PLOS One
. 2025 Aug 6;20(8):e0321239. doi: 10.1371/journal.pone.0321239

MSMCE: A novel representation module for classification of raw mass spectrometry data

Fengyi Zhang 1, Boyong Gao 1, Yinchu Wang 2,3,4, Lin Guo 2,3,4, Wei Zhang 2,3,4, Xingchuang Xiong 2,3,4,*
Editor: Hirenkumar Kantilal Mewada5
PMCID: PMC12327681  PMID: 40768503

Abstract

Mass spectrometry (MS) analysis plays a crucial role in the biomedical field; however, the high dimensionality and complexity of MS data pose significant challenges for feature extraction and classification. Deep learning has become a dominant approach in data analysis, and while some deep learning methods have achieved progress in MS classification, their feature representation capabilities remain limited. Most existing methods rely on single-channel representations, which struggle to effectively capture structural information within MS data. To address these limitations, we propose a Multi-Channel Embedding Representation Module (MSMCE), which focuses on modeling inter-channel dependencies to generate multi-channel representations of raw MS data. Additionally, we implement a feature fusion mechanism by concatenating the initial encoded representation with the multi-channel embeddings along the channel dimension, significantly enhancing the classification performance of subsequent models. Experimental results on four public datasets demonstrate that the proposed MSMCE module not only achieves substantial improvements in classification performance but also enhances computational efficiency and training stability, highlighting its effectiveness in raw MS data classification and its potential for robust application across diverse datasets.

Introduction

Mass spectrometry (MS) is a highly versatile and powerful analytical tool used for detecting, characterizing, and quantifying various analytes based on their observed mass-to-charge ratio (m/z) [1]. With technological advancements, the rate of MS data generation and its complexity have increased significantly. Its characteristics of massive data volume and high dimensionality, coupled with inherent noise and significant signal variability (including peak shifts and intensity fluctuations across samples), pose considerable challenges for data analysis. How to rapidly and accurately interpret these complex MS datasets has become a major challenge in MS data analysis.

In the field of MS data classification, traditional machine learning algorithms, such as Support Vector Machines (SVM), logistic regression, Random Forest, and XGBoost, have been widely applied and often serve as performance benchmarks [27]. However, the effective application of these traditional algorithms on MS data typically relies on a series of complex data preprocessing steps. These preprocessing measures aim to address inherent issues in MS signal acquisition, such as peak shifts across different mass spectra, baseline drift, noise interference, and variations in signal intensity, thereby extracting more interpretable biological features to support subsequent biological analysis and enhance the robustness of machine learning models [8]. Therefore, meticulous data preprocessing steps, including denoising, baseline correction, peak detection, and alignment, are usually essential operations. Concurrently, some research has also begun to explore strategies for directly utilizing raw MS data for deep learning classification [9,10], aiming to leverage the ability of deep learning models to automatically learn data representations. Nevertheless, both traditional methods and methods directly utilizing raw MS data need to address the inherent high dimensionality of MS data (a large number of m/z features) and the typically limited sample sizes in specific cohorts [11]. These characteristics can add extra complexity, making it particularly critical to develop robust models without risking overfitting or information loss due to aggressive feature selection. This persistent need for effectively learning discriminative representations from data motivates the exploration of alternative methods that can learn representations more directly from raw MS data.

In recent years, deep learning has emerged as a dominant approach in data analysis, achieving breakthrough advancements in model architectures such as Convolutional Neural Networks (CNNs) (e.g., ResNet [12], DenseNet [13], and EfficientNet [14]), and sequence modeling networks like LSTM [15], and Transformer [16]. Its rapid development has surpassed the limitations of traditional machine learning in many aspects. Consequently, researchers have actively explored its applications in the field of mass spectrometry, encompassing various areas such as disease diagnosis [17,18] and peptide sequencing and identification [1924]. A key advantage is that deep can capture complex patterns and latent features from raw MS data [24], showcase its significant potential in MS classification. However, a core challenge in MS data analysis remains how to quantitatively and effectively represent MS vectors. To address this challenge, particularly while maintaining high analytical performance at lower computational costs, many deep learning methods focus on embedding high-dimensional MS vectors into lower-dimensional spaces. These embedding methods have varied focuses; some research utilizes graph neural networks to embed molecular structural information for mass spectrometry prediction [25], while other studies have developed advanced embedding methods for tasks such as mass spectrometry clustering and similarity assessment. For example, Spec2Vec [26], inspired by representation learning techniques from natural language processing, learns embedded representations of MS vectors from large-scale MS data, verifying that the relationships between vector fragments can reflect the structural similarity of different compounds. MS2DeepScore [27] employs a Siamese neural network to learn low-dimensional embeddings of MS vectors that are used for predicting the structural similarity between chemical compounds. GLERMS [28] enhances the low-dimensional representation of MS vectors through contrastive learning, significantly improving compound identification and MS clustering performance.

In the field of image classification, multi-channel images leverage complementary information across different channels can provide a more comprehensive description of the target than single-channel images [29]. In this context, ‘channels’ typically refer to color channels (e.g., Red, Green, Blue), where each pixel in an image is formed by a combination of values from these color channels. We posit that multi-channel representations can offer analogous benefits for high-dimensional MS data. Compared to single-channel representations, multi-channel representation by fostering inter-channel information correlation, can generate more expressive features. This enables model to learn deeper patterns that are often unrevealed by single-channel representations, thereby enhancing the model’s capacity for recognize complex data patterns and improving classification performance. CNNs are exceptionally well-suited for such multi-channel representation learning due to their powerful feature extraction capabilities. CNNs can extract latent features from MS vectors by capturing spatial invariance (such as translation and scale invariance) and encoding these into feature maps, with pooling operations further refining structural features [30]. By stacking convolutional layers and applying nonlinear transformations, CNNs can dynamically generate multi-channel embedding representations, effectively capturing information from different aspects and hierarchical levels within MS data. In our work, ‘channel’ specifically refers to a distinct feature map produced by convolutional layer. Each such channel embodies a learned embedding representation or a filtered perspective of the input data. This multi-channel embedding representations not only enrich feature diversity but also enhance the model’s capability to process high-dimensional data. The inherent ability of CNNs to integrate feature extraction with classification contributes to their robustness, allowing them to maintain high accuracy even when processing MS data corrupted by noise or signal shifts. Studies have demonstrated that CNNs have been widely applied to MS data classification tasks, and have demonstrated excellent performance [3134].

Therefore, this study aims to design a Multi-Channel Embedding Representation Module that constructs inter-channel dependencies to generate more expressive feature representations, thereby improving both the classification accuracy and generalization ability of models for raw MS data. It is important to emphasize that the data processing methodology adopted in this study differs from traditional feature engineering, which aims to extract interpretable biological features. Out work is an extension of existing research that directly utilizes raw MS data for deep learning classification [9,10], focusing on optimizing the feature representation when raw MS data is input into deep learning models. The main contributions of this study are as follows:

  1. Multi-Channel Embedding Representation for Mass Spectrometry Data: By representing one-dimensional MS vector as multi-channel embedded representations, the proposed method enriches the feature expressiveness of the raw data, enabling the model to capture deeper associations between different features during the learning process. Additionally, the dimensionality of the embedded representation output by the MSMCE module is more aligned with the input expectations of convolutional and sequence models.

  2. Learnable Representation Module for End-to-End Training Framework: The proposed MSMCE module is a learnable representation module capable of integration with various deep learning classifiers, constructing an end-to-end training framework. This enables the joint optimization of feature representation learning and classification tasks, automatically learning task-specific feature representations within a unified training process to enhance classifier performance.

  3. Enhanced Computational Efficiency and Training Stability: While improving classification accuracy, the MSMCE module also contributes to computational efficiency. For CNN architectures, MSMCE reduces computational demands and GPU memory usage through its efficient embedding mechanism. For sequence models, the structured input provided by MSMCE significantly improves training stability when handling high-dimensional MS data, avoiding potential “mode collapse”.

Materials and methods

Multi-channel embedding module (MSMCE)

The core mechanism of the MSMCE module lies in performing dynamic feature representation learning on input MS vectors. Specifically, it is designed as a preceding feature learning unit for deep learning classification models. It accepts a batch of 1D MS vectors, XRB×D, where B represents the batch size and D denotes the dimensionality of each MS vector. Through its internal learnable parameters, MSMCE dynamically transforms each input 1D MS vector (i.e., single-channel) into a more information-rich and expressive multi-channel embedded representation. This process is designed to capture both global contextual information and local fine-grained structural features within the data, thereby providing a more discriminative feature input for downstream classification tasks. Fig 1 illustrates the structural design of the proposed MSMCE module.

Fig 1. Structural diagram of the multi-channel embedding (MSMCE) module.

Fig 1

This module consists of three main components: Encoder, Channel Embedding, and Channel Concatenation, which are designed to enhance the feature representation capability of mass spectrometry data. Note: This diagram illustrates the processing of a single MS vector through the module; in practice, the module processes a batch of B such vectors concurrently.

Fully connected encoder.

The input matrix X first passes through a two−layer fully connected network. After the first linear transformation, X undergoes layer normalization, ReLU (Rectified Linear Unit) activation, and Dropout. ReLU is an activation function commonly used in deep learning that introduces non-linearity by outputting the input directly if it is positive and zero otherwise; this helps mitigate the vanishing gradient problem and often improves computational efficiency. Dropout is a regularization technique applied during training where a random proportion of neuron outputs in a layer are temporarily ignored, which helps to prevent the model from overfitting to the training data and thus enhances its generalization ability to unseen data. These components collectively contribute to robust feature learning and model stability. Following these operations, this initial two-layer network effectively compresses the dimensionality of the input vectors and extracts global features to generate an initial encoded vector, providing a more refined feature representation for the subsequent channel embedding module. The mathematical formulation is as follows:

H1=ReLU(LayerNorm(XW1+b1)) (1)
H1=Dropout(H1) (2)
E=H1W2+b2 (3)

Here, W1RD×2048 is the weight matrix of the first fully connected layer, it linearly projects the D-dimensional input MS vector X into a 2048-dimensional intermediate feature space. b1R2048 is the bias vector for this layer, following XW1 linear transformation, it is added to each of the 2048 dimensions of the intermediate result. W2R2048×d is the weight matrix of the second fully connected layer, and b2Rd is the bias vector for this layer. The resulting latent embedding representation ERB×d is the global embedding representation of the MS data, where d is the embedding dimension.

The fully connected encoder aims to reduce the dimensionality of high-dimensional MS vectors through nonlinear transformations, preserving key features while eliminating redundant information. This results in a more compact feature representation, facilitating more effective downstream processing.

Channel embedding module.

For the subsequent extraction of local patterns from the global embedding representation E, it is reshaped to introduce a channel dimension:

E=E1RB×1×d (4)

The reshaped tensor is then processed through two consecutive 1D convolutional layers to further extract local features, ultimately producing multi-channel embedding representation. After the first convolutional layer, batch normalization, ReLU activation, and a Dropout are applied to ensure the model’s generalization capability. The second convolutional layer takes the output of the first layer and extracts deeper local features. Both convolutional layers use the same kernel size to maintain consistency between input and output along the length dimension, ensuring feature scale invariance. The mathematical formulation of the Channel Embedding module is as follows:

C1=ReLU(BatchNorm(Conv1D(E,K1))) (5)
C2=Conv1D(Dropout(C1),K2) (6)

Here, K1 is the kernel for the first 1D convolutional layer with shape (C2,1,3), where out_channels=C2, in_channels=1, kernel_size=3. The output C1RB×C2×d is an intermediate latent embedding from the channel embedding module. C denotes the total number of output channels by this module. K2 is the kernel for the second 1D convolutional layer with shape (C,C2,3), where out_channels=C, in_channels=C2, kernel_size=3, and its output C2RB×C×d, constitutes the final multi-channel embedding representation, which captures local patterns and fine-grained details from the initial global embedding E.

The Channel Embedding module is designed to leverage convolution operations to capture the local patterns and fine-grained details from the input data. It transforms global embedding representation into multi-channel embedding representation, thereby providing a structured format that is dimensionally more aligned with downstream classification tasks.

Channel concatenation.

To generate the final output tensor, the initial global embedding Eis concatenated with the multi−channel embedding representation C2 along the channel dimension. This concatenation operation allows for the direct integration of the global context captured by E with the detailed local patterns learned by C2. The resulting fused embedding representation O thus combines information from both the encoder and the channel embedding module, providing a richer and more comprehensive input for subsequent classification tasks. The channel concatenation can be expressed as:

O=Concat(E,C2)RB×(1+C)×d (7)

Here, O represents the final output tensor of the MSMCE module. The fused embedding representation is more diverse and representative, effectively capturing the key feature information within MS data while significantly reducing the dimensionality of the original MS vectors, thereby improving computational efficiency.

The proposed MSMCE module provides an efficient and robust method for representation learning directly from raw MS data. The module’s core innovation lies in its ability to first compress high-dimensional, sparse MS vector to extract key global contextual information. Subsequently, using cascaded convolutional layers, it deeply mines this global representation to capture local, fine-grained patterns that are difficult to discern in the raw MS data. By organically combining these two capabilities, global summarization and local detail extraction, MSMCE generates more comprehensive and hierarchical feature representations, thereby greatly enhancing its capacity to characterize complex MS data.

Dataset description

This study evaluates the proposed method using four publicly available datasets. Table 1 provides detailed information on all datasets, including mass spectrometry instruments used and the number of classified samples after data processing.

Table 1. Dataset information.

Dataset Instrument Classes Files Mass range
Canine sarcoma Synapt G2-S Q-TOF Healthy 40 100–1600
Myxosarcoma 5
Fibrosarcoma 30
Hemangiopericytoma 10
Malignant peripheral nerve tumor 5
Osteosarcoma 25
Undifferentiated pleomorphic 25
Rhabdomyosarcoma 5
Splenic fibrohistiocytic nodules 5
Histiocytic sarcoma 5
Soft tissue sarcoma 5
Gastrointestinal stromal sarcoma 5
NSCLC LTQ Orbitrap Elite ADC 6 400–1600
SCC 6
CRLM Orbitrap Fusion Lumos Control 30 400–1600
CRLM 30
RCC Q Exactive HF Control 174 70–1060
RCC 82
  1. Canine Sarcoma Dataset [35]: This dataset contains 1 healthy and 11 sarcoma histology types. It can be formulated as either a binary classification task or a 12-class classification task. This study analyzes data acquired in positive ion mode.

  2. NSCLC Dataset [36]: This dataset includes the two major histological subtypes of non-small cell lung cancer (NSCLC), namely adenocarcinoma (ADC) and squamous cell carcinoma (SCC), and is utilized herein for a binary classification task.

  3. CRLM Dataset [37]: This dataset consists of colorectal liver metastases (CRLM) tissues and normal liver tissues, serving as a binary classification task.

  4. RCC Dataset [38]: This dataset includes samples from renal cell carcinoma (RCC) patients subjects and healthy control subjects, formulated as a binary classification task. This study analyzes data acquired in positive ion mode.

Data processing workflow

Raw mass spectrometry data, fundamentally, allows a single mass spectrum sample to be conceptualized as a sequence of m/z and corresponding intensity value pairs: S = [((m/z)1, I1), ((m/z)2, I2), , ((m/z)n, In)]. During MS acquisition, the data evolves with retention time (RT), meaning each mass spectrum sample is associated with a specific time point ti. Consequently, a complete MS file can be viewed as a series of mass spectrum samples over time, T = [(S1, t1), (S2, t2), , (Sn, tn)], where each Sj is itself a high-dimensional intensity vector across numerous m/z values. This inherent complexity, with variations along both the m/z and RT dimensions, necessitates structured processing to generate a consistent input format for deep learning models.

The data processing workflow in this study is based on related work that directly utilizes raw mass spectrometry data for classification. The overall pipeline, illustrating the complete process from the input of mass spectrometry file to the generation of feature matrix, is depicted in Fig 2. For LC-MS datasets, including NSCLC, CRLM, and RCC, we referenced the data processing method described in [9]. Raw mass spectra read from the MS files were first binned along the m/z axis using a bin width of 0.1 Da, thereby generating a fixed number of m/z features for each dataset. For the RT dimension, mass spectra were aggregated within 10-second binning windows. Specifically, within each 10-second binning window, the intensity values for each corresponding m/z dimension from all mass spectra were summed and then averaged to create a representative mass spectrum sample for that RT binning window. For SpiderMass datasets, such as the Canine Sarcoma dataset, the data processing workflow reference method outlined in [10]. Initially, mass spectra with a total ion count (TIC) below 1×104 were filtered out to ensure data quality. Subsequently, the retained mass spectra were also binned along the m/z axis with a bin width of 0.1 Da to achieve a uniform data dimension, ensuring comparability between different samples. In summary, whether it involves m/z binning and RT window aggregation for LC-MS datasets, or TIC filtering and m/z binning for SpiderMass datasets, the processing ultimately creates a two-dimensional feature matrix for each mass spectrometry file. In this matrix, rows correspond to the mass spectrum samples that have undergone RT window aggregation or filtering and retention, while columns correspond to the m/z feature dimension. Each row in the two-dimensional feature matrix (i.e., a processed mass spectrum instance) is then treated as an independent input sample for the training of subsequent deep learning models. Fig 3 displays the t-SNE dimensionality reduction visualizations of the samples from all datasets after the data processing workflow, intuitively revealing the distinct intrinsic structures and class distribution characteristics among the different datasets.

Fig 2. Mass spectrometry data processing workflow.

Fig 2

This figure illustrates the preprocessing pipeline for mass spectrometry data, covering both SpiderMass and LC-MS data processing methods. The final output is a Feature Matrix, which serves as the input for subsequent analyses.

Fig 3. t-SNE visualization of all studied datasets.

Fig 3

This figure shows the two-dimensional t-SNE visualizations for the four datasets used in this study, illustrating their intrinsic data structures and class separability. (a) Canine Sarcoma dataset shown as a binary classification view; (b) Canine Sarcoma dataset shown as a 12-class view with all sarcoma subtypes; (c) NSCLC dataset; (d) CRLM dataset; (e) RCC dataset.

Training strategy

For each dataset, we first partition the file list of the raw MS data. Specifically, stratified sampling is performed based on the class labels associated with each file, dividing the list into a training set (90%) and a fixed hold-out test set (10%), thereby minimizing potential biases in subsequent model training and evaluation due to class distribution imbalances. Subsequently, all files within both the training and test sets were independently subjected to the data processing workflow. During this process, each file was transformed into multiple representative mass spectrum instances. All feature rows derived from the training set were then stacked to form the final training set feature matrix; similarly, all feature rows from the test set were used to construct the test set feature matrix in the same manner. Finally, both of these resulting feature matrices, prior to model input, were subjected to TIC normalization to reduce the data’s dynamic range and improve feature comparability across samples. The training set was then used for K-fold stratified cross-validation, with K set to 6. For each of the K folds, the data was further divided into a training fold (K1K of the training set) and a validation fold (1K of the training set), with a random seed set to ensure that all models evaluated on the same dataset used identical fold splits.

For all deep learning experiments, the MSMCE module was configured with 256 embedding channels and an embedding dimension of 1024. Models were trained using the Adam optimizer with an initial learning rate of 1×103 and a weight decay (L2 regularization) coefficient of 1×105 to mitigate overfitting. To address potential class imbalance within each training fold, class weights were computed and applied to the cross-entropy loss function. These weights were automatically set based on the sample count for each class, specifically, classes with fewer samples were assigned higher weights, while classes with more samples received lower weights. This strategy aimed to balance the contribution of different classes to the loss function during training, thereby alleviating the impact of class imbalance on the final classification results. Additionally, a learning rate scheduler (ReduceLROnPlateau) was configured to automatically adjust the learning rate when validation fold performance stagnated. Specifically, if performance did not improve for 5 epochs, the learning rate was decayed by a factor of 0.1, thus improving training stability and model convergence efficiency. An early stopping mechanism based on validation loss, with a patience of 10 epochs, was employed to prevent further overfitting and to dynamically preserve the model weights that achieved the best performance on the validation fold.

All deep learning models were trained for a maximum of 64 epochs, unless halted earlier by an early stopping mechanism. For machine learning models, standard implementations with default parameter configurations were used for baseline comparison. Finally, the performance of each model was evaluated on the independent and fixed hold-out test set, and statistical tests were conducted based on the corresponding performance metrics. All stochastic processes, including data partitioning and k-fold generation, were controlled by a unified global random seed to ensure the reproducibility of the experiments.

Training process

To validate the effectiveness of MSMCE in feature representation, we integrate the proposed module with various mainstream deep learning architectures, including CNNs such as ResNet, EfficientNet, and DenseNet, as well as sequence modeling networks like LSTM and Transformer. These models have been widely applied across different tasks, each possessing unique structural characteristics and feature extraction capabilities. It is important to emphasize that this study does not adopt an ensemble learning strategy. Instead, the MSMCE module serves as a feature representation layer that is directly connected to classification models at the structural level, optimizing feature input for improved classification performance.

To ensure compatibility of the MSMCE module with various types of classification models, we adapted the input layers of each model accordingly. For CNN models, we modified the number of input channels in their initial convolutional layer to match the number of channels in the multi-channel embedding representation output by the MSMCE module. This ensured that the generated embedding representation could be directly and effectively processed by standard convolutional operations. For sequence models, in their baseline configuration, the model received the single-channel MS vector and segmented it to form a sequence input. Specifically, the LSTM model processed this segmented sequence vector through its recurrent layered architecture, extracting features from the sequence’s final state for classification. The transformer model, on the other hand, first created an embedding representation for the segmented sequence, prepended a classification token (CLS Token) to the head of the sequence, and then utilized its self-attention mechanism to learn contextual representations of the sequence, ultimately performing classification based on this classification token. When integrated with MSMCE, both sequence models directly used the MSMCE’s output as their input sequence, where the embedded channels were treated as the sequence length, and the embedding dimension corresponded to the feature length of each element in the sequence.

As shown in Fig 4, the multi-channel features generated by the MSMCE module are directly fed into the classification model, forming an end-to-end training process. During model training, the MSMCE module and the classification model are jointly optimized, with parameter gradients updated simultaneously under a unified loss function. This approach does not rely on manual feature engineering but instead leverages a deep embedding mechanism to automatically extract and organize feature information from the data, enabling a learnable, data-driven representation of the input. The objective is to ensure that different components of the model work collaboratively, maximizing the synergy between the representation module and the classification model, ultimately achieving efficient modeling and classification of raw MS data.

Fig 4. Comparison between the training process with MSMCE and without MSMCE.

Fig 4

In the blue path, the input data first passes through the MSMCE module for feature transformation before being fed into the classification model for training. The loss gradient optimizes both the classification model and the MSMCE module. In the red path, the input data is directly fed into the classification model, and the loss gradient is only used to optimize the classification model.

Results

To evaluate the effectiveness of the proposed MSMCE module, we conducted a systematic comparison between models incorporated with the MSMCE module and their original counterparts without MSMCE across four different datasets. For all experiments, the batch size for the NSCLC, CRLM, and RCC datasets was set to 64, while the batch size for the Canine Sarcoma dataset was set to 32. For all multi-class classification tasks reported in this paper, the overall Accuracy and F1-Score metrics were obtained by first calculating these metrics for each class independently and then taking their unweighted average (‘macro’ averaging).

To assess the statistical significance of performance improvements, we employed the paired Wilcoxon signed-rank test to compare models integrated with MSMCE against their respective baselines. The paired samples for each test consisted of the performance scores from the 6 individual folds of the cross-validation (i.e., N = 6 for each test). All p-values were calculated from a two-sided test, and a value of less than 0.05 was considered statistically significant.

Large-scale binary classification datasets

The NSCLC, CRLM, and RCC datasets have large sample sizes and simple class structures, as they are all binary classification tasks. Furthermore, these datasets exhibit relatively balanced class distribution, providing an ideal testing environment for evaluating the feature representation learning capability of the proposed MSMCE module. Conducting experiments on these large-scale datasets aims to validate the effectiveness and robustness of the MSMCE approach when dealing with high-volume data, particularly in assessing its comprehensive performance in extracting discriminative features, improving classification accuracy, and handling balanced data distributions effectively.

The experimental results demonstrate that deep learning models integrated with MSMCE module achieved significant improvements in classification performance across all three datasets compared to their original counterparts. As shown in Table 2, on the NSCLC dataset, MSMCE-ResNet50 achieved an accuracy of 0.9785, a statistically significant increase of 1.28% (p = 0.03, N = 6) compared to the original model accuracy of 0.9661. For the EfficientNetB0 model, the high p-value (p = 0.84) suggests that the baseline model’s performance was already approaching a “performance ceiling” on this dataset, leaving little room for statistically significant improvement from the MSMCE module. This is further corroborated by the relatively simple intrinsic structure of the dataset, as revealed by its t-SNE visualization in Fig 3(c), where the different classes already exhibit clear separability. As detailed in Table 3, on the CRLM dataset, the accuracy of MSMCE-DenseNet121 improved by 6.66% (p = 0.03, N = 6) relative to DenseNet121. Furthermore, as presented in Table 4, on the RCC dataset, the F1-Score for the DenseNet121 increased from 0.6946 to 0.7952, an improvement of 14.48% (p = 0.03, N = 6). For the EfficientNetB0 and LSTM models, the non-significant p-values may reflect a mismatch between their specific architectures and the intrinsic characteristics of the data. When a model’s inductive bias (e.g., spatial locality for CNNs or sequential dependency for LSTMs) is not key to effective discrimination for this dataset, the performance gains from MSMCE’s enhanced representation can be limited by the classifier itself, thus not reaching statistical significance.

Table 2. Comparison of classification performance of different models on the NSCLC dataset.

NSCLC Baseline MSMCE
Classes = 2 Accuracy F1-Score Accuracy F1-Score
RF 0.9935 ± 0.00 0.9935 ± 0.00
SVM 0.9690 ± 0.00 0.9690 ± 0.00
LDA 0.8879 ± 0.01 0.8879 ± 0.01
ResNet50 0.9661 ± 0.00 0.9661 ± 0.00 0.9785 ± 0.00 (p = 0.03) 0.9785 ± 0.00 (p = 0.03)
DenseNet121 0.9659 ± 0.00 0.9659 ± 0.00 0.9806 ± 0.00 (p = 0.03) 0.9806 ± 0.00 (p = 0.03)
EfficientNetB0 0.9769 ± 0.00 0.9769 ± 0.00 0.9782 ± 0.00 (p = 0.84) 0.9782 ± 0.00 (p = 0.84)
LSTM 0.7342 ± 0.26 0.6508 ± 0.35 0.9798 ± 0.00 (p = 0.03) 0.9798 ± 0.00 (p = 0.03)
Transformer 0.4999 ± 0.00 0.3333 ± 0.00 0.9813 ± 0.00 (p = 0.03) 0.9813 ± 0.00 (p = 0.03)

Baseline column represents the models without the MSMCE module integrated, while the MSMCE column denotes the corresponding models with the MSMCE module integrated.

Table 3. Comparison of classification performance of different models on the CRLM dataset.

CRLM Baseline MSMCE
Classes = 2 Accuracy F1-Score Accuracy F1-Score
RF 0.9087 ± 0.00 0.9085 ± 0.00
SVM 0.8939 ± 0.00 0.8938 ± 0.00
LDA 0.9026 ± 0.00 0.9026 ± 0.00
ResNet50 0.8818 ± 0.02 0.8816 ± 0.02 0.9212 ± 0.01 (p = 0.03) 0.9212 ± 0.01 (p = 0.03)
DenseNet121 0.8643 ± 0.03 0.8641 ± 0.03 0.9219 ± 0.01 (p = 0.03) 0.9219 ± 0.01 (p = 0.03)
EfficientNetB0 0.8870 ± 0.00 0.8867 ± 0.00 0.9189 ± 0.00 (p = 0.03) 0.9188 ± 0.00 (p = 0.03)
LSTM 0.5000 ± 0.00 0.3333 ± 0.00 0.9221 ± 0.01 (p = 0.03) 0.9220 ± 0.01 (p = 0.03)
Transformer 0.5000 ± 0.00 0.3333 ± 0.00 0.9154 ± 0.01 (p = 0.03) 0.9153 ± 0.01 (p = 0.03)

Table 4. Comparison of classification performance of different models on the RCC dataset.

RCC Baseline MSMCE
Classes = 2 Accuracy F1-Score Accuracy F1-Score
RF 0.8055 ± 0.00 0.7369 ± 0.01
SVM 0.6754 ± 0.00 0.4811 ± 0.00
LDA 0.6352 ± 0.01 0.6056 ± 0.02
ResNet50 0.7803 ± 0.01 0.7431 ± 0.02 0.8082 ± 0.01 (p = 0.03) 0.7872 ± 0.01 (p = 0.03)
DenseNet121 0.7494 ± 0.01 0.6946 ± 0.02 0.8156 ± 0.01 (p = 0.03) 0.7952 ± 0.00 (p = 0.03)
EfficientNetB0 0.8109 ± 0.01 0.7826 ± 0.01 0.8064 ± 0.01 (p = 0.69) 0.7864 ± 0.01 (p = 0.69)
LSTM 0.6991 ± 0.05 0.5354 ± 0.16 0.7865 ± 0.06 (p = 0.16) 0.7232 ± 0.16 (p = 0.16)
Transformer 0.6777 ± 0.00 0.4000 ± 0.00 0.7709 ± 0.06 (p = 0.04) 0.6946 ± 0.16 (p = 0.04)

Moreover, across these three datasets, the Transformer model, after being integrated with the MSMCE module, successfully avoide the extremely low Accuracy and F1-Score observed in the baseline. It was able to converge stably and the learn effective discriminative features, which further underscores the significant role of the MSMCE module in optimizing input representations and improving the training dynamics of complex models. To more intuitively illustrate this critical transformation brought by the MSMCE module, Fig 5 shows a comparison of the confusion matrices for the Baseline Transformer and the MSMCE-Transformer on the NSCLC, CRLM, and RCC datasets. The figure clearly reveals how the MSMCE module helps the Transformer escape the predicament of predicting all samples as a single class and achieve effective discrimination between the two classes.

Fig 5. Comparison of confusion matrices for transformer and MSMCE-transformer on three binary classification datasets.

Fig 5

This figure presents a comparative analysis of the confusion matrices for the baseline Transformer model (top row) and the MSMCE-Transformer model (bottom row) across the NSCLC, CRLM, and RCC datasets. The confusion matrices for the baseline Transformer clearly reveal that the model predicts all samples as a single class, resulting in extremely poor discriminative performance. In stark contrast, the confusion matrices for the MSMCE-Transformer model show a high concentration of values along the diagonal, indicating high numbers of true positives and true negatives.

It is noteworthy that on the NSCLC dataset, the RF model achieved the highest F1-Score of 0.9935 among all tested models. However, on the CRLM and RCC datasets, the RF model’s performance did not surpass that of the CNN models integrated with MSMCE. This may suggest that the inherent feature space of the NSCLC dataset is particularly well-suited for learning by tree-based ensemble methods like Random Forest. Nevertheless, the key finding is that the MSMCE module consistently delivered performance enhancements across all tested deep learning models and datasets, highlighting its general utility in optimizing raw MS data representation for improved deep learning classification efficacy.

Small-scale multi-class classification dataset

To further validate the effectiveness of the MSMCE module, experiments were conducted on the Canine Sarcoma dataset. This dataset is characterized by numerous classes and a limited sample size, and it was utilized for evaluating performance on both binary and 12-class classification tasks.

The experimental results indicate that even on datasets with a limited sample size and numerous classes, models integrated with the MSMCE module still achieve significant performance improvements. As shown in Table 5, for the binary classification task on the Canine Sarcoma dataset, the accuracy of Transformer improved from 0.7848 to 0.9933, an increase of 26.57% (p = 0.03, N = 6). For the 12-class classification task on the Canine Sarcoma dataset, the result in Table 6 show that the accuracy of ResNet50 improved from 0.7235 to 0.9043, an increase of 24.99% (p = 0.03, N = 6). The t-SNE visualization of the 12-class Canine Sarcoma dataset, as show in Fig 3(b), reveals significant class overlaps among its sarcoma subtypes, highlighting the inherent substantial difficulty in distinguishing them. The significant progress made by MSMCE-ResNet50 underscores the effectiveness of the MSMCE module in learning more discriminative representations capable of disentangling these closely related subtypes, even with limited samples per class. Furthermore, the relatively suboptimal performance of the MSMCE-Transformer model on the 12-class Canine Sarcoma task might be attributed to the inherent complexity of the Transformer model in conjunction with the limited number of training samples available for each fine-grained subtype in this multi-class scenario, which may have hindered the model from adequately learning robust discriminative features among all classes.

Table 5. Comparison of classification performance of different models on the canine sarcoma dataset (Classes = 2).

Canine sarcoma Baseline MSMCE
Classes = 2 Accuracy F1-Score Accuracy F1-Score
RF 0.9836 ± 0.00 0.9750 ± 0.00
SVM 0.9768 ± 0.00 0.9646 ± 0.00
LDA 0.9357 ± 0.01 0.9082 ± 0.01
ResNet50 0.9671 ± 0.01 0.9509 ± 0.02 0.9955 ± 0.00 (p = 0.03) 0.9934 ± 0.00 (p = 0.03)
DenseNet121 0.9783 ± 0.01 0.9679 ± 0.01 0.9940 ± 0.00 (p = 0.03) 0.9912 ± 0.00 (p = 0.03)
EfficientNetB0 0.9783 ± 0.00 0.9683 ± 0.00 0.9933 ± 0.00 (p = 0.03) 0.9901 ± 0.00 (p = 0.03)
LSTM 0.6951 ± 0.24 0.4278 ± 0.15 0.9955 ± 0.00 (p = 0.03) 0.9934 ± 0.00 (p = 0.03)
Transformer 0.7848 ± 0.00 0.4397 ± 0.00 0.9933 ± 0.01 (p = 0.03) 0.9902 ± 0.01 (p = 0.03)

Table 6. Comparison of classification performance of different models on the canine sarcoma dataset (Classes = 12).

Canine sarcoma Baseline MSMCE
Classes = 12 Accuracy F1-Score Accuracy F1-Score
RF 0.8610 ± 0.01 0.8427 ± 0.02
SVM 0.8498 ± 0.01 0.8303 ± 0.01
LDA 0.7721 ± 0.02 0.7700 ± 0.02
ResNet50 0.7235 ± 0.04 0.7224 ± 0.04 0.9043 ± 0.03 (p = 0.03) 0.9244 ± 0.03 (p = 0.03)
DenseNet121 0.7265 ± 0.04 0.7264 ± 0.04 0.9178 ± 0.03 (p = 0.03) 0.9362 ± 0.02 (p = 0.03)
EfficientNetB0 0.8520 ± 0.02 0.8474 ± 0.02 0.8969 ± 0.03 (p = 0.03) 0.9163 ± 0.03 (p = 0.03)
LSTM 0.1854 ± 0.11 0.0353 ± 0.04 0.8767 ± 0.03 (p = 0.03) 0.8973 ± 0.03 (p = 0.03)
Transformer 0.1883 ± 0.03 0.0263 ± 0.00 0.7324 ± 0.10 (p = 0.03) 0.7352 ± 0.12 (p = 0.03)

These results suggest that the MSMCE module, by constructing multi-channel dependencies, enhances the expressive power of raw MS data. Even on datasets with limited samples and numerous classes, models integrated with MSMCE maintain high classification performance. This enhanced feature representation improves the model’s robustness when handling highly complex and diverse mass spectrometry data.

Ablation study

To evaluate the actual contribution of the MSMCE module to model performance, we conducted an ablation study on the 12-class Canine Sarcoma dataset. ResNet50 was selected as the base model, and we progressively introduced Encoder, Channel Embedding, and Channel Concatenation. For this ablation study, model variants were trained on the entire training set (not subjected to k-fold splitting) and their performance was subsequently evaluated on the independent hold-out test set, to systematically assess their impact on classification performance.

As shown in Table 7, introducing the Encoder module into the Baseline model leads to a moderate improvement in accuracy and F1-Score, indicating that the Encoder plays a role in initial feature extraction. However, the validation accuracy curve for the Encoder exhibits significant fluctuations (see Fig 6), suggesting that the model’s generalization ability is still limited. With the addition of the Channel Embedding module, the model’s performance improves significantly, achieving an accuracy of 0.8834 and an F1-Score of 0.8888. This result demonstrates that the Channel Embedding module effectively captures complex relationships between multiple channels, enhancing feature representation. Additionally, the validation performance becomes more stable. Building on this, incorporating the Channel Concatenation, which fuses encoded features with channel-embedded features, further optimizes model performance, reaching its best results. This highlights the importance of feature fusion in enhancing classification performance by integrating global and local feature information.

Table 7. Ablation study results: Impact of different module combinations on ResNet-50 classification performance.

Baseline
(ResNet-50)
Encoder Channel Embedding Channel Concatenation Accuracy F1-Score
0.7803 0.7798
0.8072 0.7955
0.8834 0.8888
0.9417 0.9410

Fig 6. Accuracy trends during training and validation in the ablation study.

Fig 6

This figure illustrates the accuracy changes on the training and validation sets for ResNet-50 and its variants with progressively introduced MSMCE submodules. The results highlight the impact of different components on model performance. Before introducing the Channel Embedding module, the accuracy curves exhibit significant fluctuations. However, after incorporating Channel Embedding and Channel Concatenation, the accuracy curves become notably more stable, indicating improved model robustness and convergence.

Computational efficiency analysis

The computational efficiency of the different deep learning models was evaluated based on two key metrics: Floating Point Operations (FLOPs) and the estimated peak GPU memory footprint during training. For each deep learning architecture, FLOPs were calculated using the profile function from the Python library thop. This calculation was performed by passing a single-sample input tensor of shape (1, spectrum dim) to the function. The model size is the Estimated Total Size (MB), which was estimated to use the summary function from the Python library torchinfo. This estimation was based on an input size of (batch size, spectrum dim), to reflect the peak GPU memory usage during one complete forward and backward pass. As shown in Fig 7, ResNet50, DenseNet121, and EfficientNetB0 exhibit higher FLOPs and larger model sizes, indicating greater computational overhead. However, after being integrated with the MSMCE module, the FLOPs of all these models are substantially reduced (as detailed in S1 File), and the model sizes also decrease, while classification accuracy improves considerably. These findings suggest that the MSMCE module optimizes computational resource utilization by employing an efficient feature embedding approach, enabling models to achieve superior performance with reduced computational cost.

Fig 7. Radar chart of model training efficiency on the Canine Sarcoma (12-class) Dataset.

Fig 7

This figure presents the computational efficiency of ResNet-50, DenseNet-121, EfficientNet-B0, LSTM, and Transformer, along with their corresponding MSMCE-enhanced versions. It can be observed that MSMCE-enhanced models achieve significantly improved classification accuracy while maintaining lower computational costs (FLOPs) and smaller model sizes. Radar charts for training efficiency on other datasets are provided in the supplementary materials.

This performance improvement is attributed to the MSMCE module, which first reduces the dimensionality of high-dimensional data before channel embedding. This significantly decreases the data dimension, ensuring that subsequent convolution operations require less computational effort to capture structural information, thereby reducing unnecessary computational overhead in high-dimensional sparse spaces. Additionally, although LSTM and Transformer models experience an increase in computational demand (higher FLOPs and model size) after integrated with MSMCE, their training stability improves significantly. This is a critical trade-off, as it prevents the convergence failures—such as converging to a single-class prediction—observed in the baseline models.

Notably, while the MSMCE module itself increases the model’s depth, it ultimately enhances computational efficiency. Furthermore, we observe consistent performance improvements across models of different depths, indicating that the benefits of MSMCE can be effectively integrated with various model architectures, making it suitable for both residual and non-residual network structures.

Discussion

The performance of machine learning methods is heavily dependent on the choice of data representation (or features) on which they are applied [39]. Constructing efficient and accurate representations for high-dimensional, complex MS vectors is a fundamental challenge in the field of MS data analysis. Particularly when using raw MS data for deep learning classification, how to effectively transform raw, high-dimensional MS vectors into an input representation that can be efficiently learned by models is a key problem worthy of exploring.

This study proposes a supervised representation learning module based on multi-channel embedding, MSMCE, which converts raw MS vectors into multi-channel embedded representations, significantly enhancing feature expressiveness, thereby optimizing the input for downstream deep learning models. Results show that this method also achieves significant performance improvements on datasets with numerous classes and limited sample sizes. Compared to traditional single-channel representations, the multi-channel embedding representation demonstrates significant advantages in improving classification accuracy, model training stability, and generalization capability. This is validated in Tables 2–4, where the Baseline Transformer exhibits extremely poor classification performance when directly processing high-dimensional, single-channel data. The model falls into what is known in machine learning as “mode collapse”, a phenomenon where it converges to predict a single class for all samples. In contrast, the MSMCE module effectively avoids this issue by providing a more richly expressive and structured input, ensuring that the model can converge properly and learn discriminative feature representations. Additionally, ablation studies further validate the important contributions of key components within the MSMCE module, including Encoder, Channel Embedding, and Channel Concatenation, to enhancing feature representation and classification performance. Therefore, the proposed MSMCE module provides a novel and efficient feature representation method for deep learning classification directly using raw MS data, effectively addressing the limitations of representational power inherent in single-channel representations, while also expanding the application prospects of representation learning in the field of MS data analysis.

Limitations and future directions

Although this study has made significant progress, several aspects require further exploration. First, while multi-channel embedding demonstrates robustness on multi-class, small-sample datasets, its performance may still be limited in extreme cases, such as severely imbalanced class distributions. This is because deep learning models rely heavily on data, and when the sample size is too small, the model may fail to effectively learn the feature representations in the embedding space, leading to training failure. Second, the choice of embedding dimensions and parameter optimization still require dataset-specific tuning. Future research could incorporate adaptive feature selection mechanisms to make feature representations more dynamic and better suited for different datasets. Lastly, a major limitation of deep learning models in clinical decision-making is the lack of well-defined interpretability methods [40]. While multi-channel embedding provides rich feature representations, a notable limitation of the current study is that we have not deeply investigated the direct biological interpretation of these learned features or how they correlate with specific clinically relevant molecular patterns. Our primary focus was on computational methodology and the empirical demonstration of performance gains achieved by the MSMCE module. Uncovering the biological significance of the features learned by the different channels within MSMCE embeddings—for example, by identifying which m/z regions or patterns contribute most to the classifications and correlating these with known biomarkers or biological pathways—is a complex task that would require further dedicated bioinformatics analysis and potentially experimental validation, extending beyond the scope of this initial methodological work. This remains an important open research question and a key direction for future studies aimed at enhancing the translational potential and interpretability of the MSMCE module in biomedical applications.

Future work can focus on improving the interpretability of the MSMCE module, further uncovering the biological significance of embedded features across different channels to enhance the credibility of the model in clinical applications. Additionally, future studies could evaluate the effectiveness of this module in transfer learning scenarios, exploring its transferability across different datasets and tasks to assess its generalizability in mass spectrometry data analysis.

Conclusion

This study proposes a supervised representation learning module based on multi-channel embedding, MSMCE, which transforms raw MS vectors into multi-channel embedded representations, significantly enhancing feature expressiveness. Experimental results demonstrate that this method not only improves classification accuracy but also enhances model stability and generalization capability. Additionally, ablation studies validate the critical roles of the Encoder, Channel Embedding, and Channel Concatenation, further proving the effectiveness of the MSMCE module in feature learning. Overall, this study provides a novel feature representation method for raw MS data analysis, expanding the application prospects of representation learning in the field and offering new technical support for raw MS data analysis.

Code availability

The source code for this study is available on GitHub at https://github.com/WoodFY/MSMCE.

Supporting information

S1 File. Estimated model size & FLOPs.

Tabulated summary of estimated peak GPU memory footprint (MB) and Floating-Point Operations (FLOPs) for baseline and MSMCE-enhanced deep learning models.

(XLSX)

pone.0321239.s001.xlsx (17.2KB, xlsx)
S2 File. Experiments Bootstrap confidence intervals.

Bootstrap 95% confidence intervals for mean performance metrics (Accuracy, Precision, Recall, F1-Score) from k-fold cross-validation on the hold-out test set for all evaluated models.

(XLSX)

pone.0321239.s002.xlsx (20KB, xlsx)

Acknowledgments

This work was received computational support from the National Institute of Metrology, China.

Data Availability

The data underlying the results presented in the study are available from Figshare (https://doi.org/10.6084/m9.figshare.29148629.v1).

Funding Statement

This work was supported by the Science & Technology Fundamental Resources Investigation Program (Grant No.2022FY101200) awarded to XX.

References

  • 1.Beck AG, Muhoberac M, Randolph CE, Beveridge CH, Wijewardhane PR, Kenttamaa HI, et al. Recent developments in machine learning for mass spectrometry. ACS Measure Sci Au. 2024;4(3):233–46. doi: 10.1021/acsmeasuresciau.3c00060 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Xie YR, Castro DC, Bell SE, Rubakhin SS, Sweedler JV. Single-cell classification using mass spectrometry through interpretable machine learning. Anal Chem. 2020;92(13):9338–47. doi: 10.1021/acs.analchem.0c01660 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Brorsen LF, McKenzie JS, Tullin MF, Bendtsen KMS, Pinto FE, Jensen HE, et al. Cutaneous squamous cell carcinoma characterized by MALDI mass spectrometry imaging in combination with machine learning. Sci Rep. 2024;14(1):11091. doi: 10.1038/s41598-024-62023-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Wu B, Abbott T, Fishman D, McMurray W, Mor G, Stone K, et al. Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data. Bioinformatics. 2003;19(13):1636–43. doi: 10.1093/bioinformatics/btg210 [DOI] [PubMed] [Google Scholar]
  • 5.Gredell DA, Schroeder AR, Belk KE, Broeckling CD, Heuberger AL, Kim SY, et al. Comparison of machine learning algorithms for predictive modeling of beef attributes using rapid evaporative ionization mass spectrometry (REIMS) data. Mass Spectrometry Imaging in Food Analysis. CRC Press; 2020. pp. 181–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Datta S, DePadilla LM. Feature selection and machine learning with mass spectrometry data for distinguishing cancer and non-cancer samples. Stat Methodol. 2006;3(1):79–92. [Google Scholar]
  • 7.Vervier K, Mahé P, Veyrieras JB, Vert JP. Benchmark of structured machine learning methods for microbial identification from mass-spectrometry data. arXiv preprint. 2015. doi: 10.48550/arXiv.1506.07251 [DOI] [Google Scholar]
  • 8.Hilario M, Kalousis A, Pellegrini C, Müller M. Processing and classification of protein mass spectra. Mass Spectrom Rev. 2006;25(3):409–49. doi: 10.1002/mas.20072 [DOI] [PubMed] [Google Scholar]
  • 9.Seddiki K, Precioso FE, Sanabria M, Salzet M, Fournier I, Droit A. Early diagnosis: end-to-end CNN-LSTM models for mass spectrometry data classification. Anal Chem. 2023;95(36):13431–7. doi: 10.1021/acs.analchem.3c00613 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Seddiki K, Saudemont P, Precioso F, Ogrinc N, Wisztorski M, Salzet M, et al. Cumulative learning enables convolutional neural network representations for small mass spectrometry data classification. Nat Commun. 2020;11(1):5595. doi: 10.1038/s41467-020-19354-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Niu J, Xu W, Wei D, Qian K, Wang Q. Deep learning framework for integrating multibatch calibration, classification, and pathway activities. Anal Chem. 2022;94(25):8937–46. doi: 10.1021/acs.analchem.2c00601 [DOI] [PubMed] [Google Scholar]
  • 12.He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. pp. 770–8. [Google Scholar]
  • 13.Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. pp. 4700–8. [Google Scholar]
  • 14.Tan M, Le Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In: International conference on machine learning. 2019. pp. 6105–14. [Google Scholar]
  • 15.Hochreiter S. Long Short-term Memory. Neural Computation MIT-Press; 1997. [DOI] [PubMed] [Google Scholar]
  • 16.Vaswani A. Attention is all you need. In: Advances in Neural Information Processing Systems. 2017. [Google Scholar]
  • 17.Mathema VB, Sen P, Lamichhane S, Orešič M, Khoomrung S. Deep learning facilitates multi-data type analysis and predictive biomarker discovery in cancer precision medicine. Comput Struct Biotechnol J. 2023;21:1372–82. doi: 10.1016/j.csbj.2023.01.043 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Deng Y, Yao Y, Wang Y, Yu T, Cai W, Zhou D, et al. An end-to-end deep learning method for mass spectrometry data analysis to reveal disease-specific metabolic profiles. Nat Commun. 2024;15(1):7136. doi: 10.1038/s41467-024-51433-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Tran NH, Zhang X, Xin L, Shan B, Li M. De novo peptide sequencing by deep learning. Proc Natl Acad Sci U S A. 2017;114(31):8247–52. doi: 10.1073/pnas.1705691114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Karunratanakul K, Tang H-Y, Speicher DW, Chuangsuwanich E, Sriswasdi S. Uncovering thousands of new peptides with sequence-mask-search hybrid de novo peptide sequencing framework. Mol Cell Proteomics. 2019;18(12):2478–91. doi: 10.1074/mcp.TIR119.001656 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Qiao R, Tran NH, Xin L, Chen X, Li M, Shan B, et al. Computationally instrument-resolution-independent de novo peptide sequencing for high-resolution devices. Nat Mach Intell. 2021;3(5):420–5. [Google Scholar]
  • 22.Yilmaz M, Fondrie W, Bittremieux W, Oh S, Noble WS. De novo mass spectrometry peptide sequencing with a transformer model. In: International Conference on Machine Learning. 2022. pp. 25514–22. [Google Scholar]
  • 23.Petrovskiy DV, Nikolsky KS, Kulikova LI, Rudnev VR, Butkova TV, Malsagova KA, et al. PowerNovo: de novo peptide sequencing via tandem mass spectrometry using an ensemble of transformer and BERT models. Sci Rep. 2024;14(1):15000. doi: 10.1038/s41598-024-65861-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Wang Y, Li D, Chen C, Cai X, Sun S, Cui X. Learned fingerprint embedding for large-scale peptide mass spectra retrieval. In: 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2023. pp. 306–11. [Google Scholar]
  • 25.Park J, Jo J, Yoon S. Mass spectra prediction with structural motif-based graph neural networks. Sci Rep. 2024;14(1):1400. doi: 10.1038/s41598-024-51760-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Huber F, Ridder L, Verhoeven S, Spaaks JH, Diblen F, Rogers S, et al. Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships. PLoS Comput Biol. 2021;17(2):e1008724. doi: 10.1371/journal.pcbi.1008724 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Huber F, van der Burg S, van der Hooft JJJ, Ridder L. MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra. J Cheminform. 2021;13(1):84. doi: 10.1186/s13321-021-00558-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Guo H, Xue K, Sun H, Jiang W, Pu S. Contrastive learning-based embedder for the representation of tandem mass spectra. Anal Chem. 2023;95(20):7888–96. doi: 10.1021/acs.analchem.3c00260 [DOI] [PubMed] [Google Scholar]
  • 29.Xiao Y, Wu J, Yuan J. mCENTRIST: a multi-channel feature generation mechanism for scene categorization. IEEE Transac Image Process. 2013;23(2):823–36. doi: 10.1109/TIP.2013.2295756 [DOI] [PubMed] [Google Scholar]
  • 30.Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, et al. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. pp. 1–9. [Google Scholar]
  • 31.Petrovsky DV, Kopylov AT, Rudnev VR, Stepanov AA, Kulikova LI, Malsagova KA, et al. Managing of unassigned mass spectrometric data by neural network for cancer phenotypes classification. J Pers Med. 2021;11(12):1288. doi: 10.3390/jpm11121288 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Wang G, Ruser H, Schade J, Passig J, Adam T, Dollinger G, et al. 1D-CNN network based real-time aerosol particle classification with single-particle mass spectrometry. IEEE Sensors Lett. 2023. [Google Scholar]
  • 33.Papagiannopoulou C, Parchen R, Rubbens P, Waegeman W. Fast pathogen identification using single-cell matrix-assisted laser desorption/ionization-aerosol time-of-flight mass spectrometry data and deep learning methods. Anal Chem. 2020;92(11):7523–31. doi: 10.1021/acs.analchem.9b05806 [DOI] [PubMed] [Google Scholar]
  • 34.Cadow J, Manica M, Mathis R, Reddel RR, Robinson PJ, Wild PJ, et al. On the feasibility of deep learning applications using raw mass spectrometry data. Bioinformatics. 2021;37(Supplement_1):i245–53. doi: 10.1093/bioinformatics/btab311 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Saudemont P, Quanico J, Robin YM, Baud A, Balog J, Fatou B, et al. Real-time molecular diagnosis of tumors using water-assisted laser desorption/ionization mass spectrometry technology. Cancer Cell. 2018;34(5):840–51. doi: 10.1016/j.ccell.2018.10.001 [DOI] [PubMed] [Google Scholar]
  • 36.Zhang W, Wei Y, Ignatchenko V, Li L, Sakashita S, Pham N-A, et al. Proteomic profiles of human lung adeno and squamous cell carcinoma using super-SILAC and label-free quantification approaches. Proteomics. 2014;14(6):795–803. doi: 10.1002/pmic.201300382 [DOI] [PubMed] [Google Scholar]
  • 37.van Huizen NA, van den Braak RR, Doukas M, Dekker LJ, IJzermans JN, Luider TM. Up-regulation of collagen proteins in colorectal liver metastasis compared with normal liver tissue. J Biol Chem. 2019;294(1):281–9. doi: 10.1074/jbc.RA118.005087 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Bifarin OO, Gaul DA, Sah S, Arnold RS, Ogan K, Master VA, et al. Machine learning-enabled renal cell carcinoma status prediction using multiplatform urine-based metabolomics. J Proteome Res. 2021;20(7):3629–41. doi: 10.1021/acs.jproteome.1c00213 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell. 2013;35(8):1798–828. doi: 10.1109/TPAMI.2013.50 [DOI] [PubMed] [Google Scholar]
  • 40.Venugopalan J, Tong L, Hassanzadeh HR, Wang MD. Multimodal deep learning models for early detection of Alzheimer’s disease stage. Sci Rep. 2021;11(1):3254. doi: 10.1038/s41598-020-74399-w [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Hirenkumar Mewada

6 May 2025

PONE-D-25-08088MSMCE: A Novel Representation Module for Classification of Raw Mass Spectrometry DataPLOS ONE

Dear Dr. Xiong,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Jun 20 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Hirenkumar Kantilal Mewada

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1.Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that funding information should not appear in any section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. Please remove any funding-related text from the manuscript.

3. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match.

When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.

4. When completing the data availability statement of the submission form, you indicated that you will make your data available on acceptance. We strongly recommend all authors decide on a data sharing plan before acceptance, as the process can be lengthy and hold up publication timelines. Please note that, though access restrictions are acceptable now, your entire data will need to be made freely accessible if your manuscript is accepted for publication. This policy applies to all data except where public deposition would breach compliance with the protocol approved by your research ethics board. If you are unable to adhere to our open data policy, please kindly revise your statement to explain your reasoning and we will seek the editor's input on an exemption. Please be assured that, once you have provided your new statement, the assessment of your exemption will not hold up the peer review process.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: No

Reviewer #3: I Don't Know

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This manuscript presents MSMCE, a novel deep learning-based feature representation module designed for the classification of raw mass spectrometry (MS) data. The proposed method addresses known limitations of single-channel feature representations by introducing multi-channel embeddings with a residual-like concatenation strategy. The model is well-motivated, technically sound, and its performance is evaluated across four diverse and publicly available datasets using multiple neural network architectures (e.g., ResNet, DenseNet, LSTM, Transformer). The availability of code and data enhances the reproducibility and transparency of the study.

The authors demonstrate notable empirical improvements in classification accuracy and computational efficiency, supported by clear ablation studies. However, the manuscript would benefit from several key improvements to strengthen its statistical and practical contributions:

Major concerns:

1. While performance metrics (accuracy, precision, recall, F1) are thoroughly reported, the manuscript lacks statistical testing to validate the significance of the observed improvements. Repeated runs with different random seeds and the use of standard tests (e.g., Wilcoxon signed-rank test, bootstrapping confidence intervals) are recommended. This is particularly important for small-sample settings (e.g., Canine Sarcoma dataset).

2. The MSMCE module's effectiveness is well demonstrated quantitatively, but the biological meaning of the learned features is not explored. Given the biomedical context of mass spectrometry data, the manuscript would benefit from a discussion (or illustrative example) of how MSMCE embeddings relate to biologically meaningful features or clinically relevant patterns. This is important for building trust and enabling adoption in translational research.

3. The manuscript refers to a “residual connection” via channel concatenation, which is misleading, as no additive operation is applied. Please revise the terminology to avoid confusion with standard residual connections.

4. Details about hyperparameter selection, learning rate tuning, and batch sizes should be more explicitly stated or summarized in a table.

5. The methods for computing FLOPs and model size should be clearly described.

6. The t-SNE plots and radar charts in the supplementary material are helpful but not discussed in the main text. Please integrate interpretations of these visualizations into the Results or Discussion to show how MSMCE improves feature separability and efficiency.

Minor concerns:

• The manuscript is well-written in general. Minor edits are suggested for typographical consistency (e.g., consistent table formatting, figure references). Also, be precise with terms such as "residual connection."

• The references are not cited in sequential order (e.g., the citation sequence jumps from [3–8] to [10], or from [24–26] to [22,32,39]), which may confuse readers and should be corrected for consistency.

• Although ReLU and Dropout are standard components in deep learning, they should be briefly explained for clarity, given that PLOS ONE targets a multidisciplinary audience that may not be familiar with these concepts.

• Figures 1 to 3 are of very poor quality, with unreadable text. They should be replaced with higher-resolution versions to ensure legibility and clarity.

Reviewer #2: Authors have proposed a novel perspective and strategy for using feature-representations from convolutional neural network layers in analyzing complex, and spatially and temporally intertwined MS data. In the proposed MSMCE, the feature-representations from two CNN layer (referred to as channel) has been used to enhance data representation. This multi –channel representation, accompanying an encoder layer and a feature integration strategy has enhanced performance of the model and decreased computational cost of model training. Authors have compared their proposed model to existing models and have done an ablation study to show contribution of embedded channels.

While the novel innovation and proposed technique enhances the performance of the model and opens a new perspective, the paper lacks major details and statistics necessary for scientific reporting. The most important missing aspect is proper randomization and testing of the model. It is not clear if the reported enhancement is significant in comparison to the existing models. From the ablation study (table 7), I am convinced that the multi-channel embedding improves the performance, but I have major concerns about how the results have been reported. A lot of details about the models, their hyperparameters, training and testing , and regularization strategies are missing, making it hard to properly evaluate and compare contribution for multi-channel embedding.

Here are a few notes:

The last sentence of the abstract: “Experimental results ...MS data classification.” The link between reduced computational resource and generalizability of the model is not given and trivial.

This sentence in the introduction: “The large ... cancer detection”. The challenges need to be explicitly included. Explicitly including the challenges help to develop motivation of the paper.

This sentence: “These operations ... MS signal acquisition,” the issues need to be explicitly listed: such as peak shift, etc

This sentence: “Nevertheless, the high dimensionality of MS data combined with limited sample sizes makes it challenging for traditional methods to effectively meet the demands of data analysis [11].”, It is not the high dimensionality of data or small sample size that limits the performance of the traditional methods, but the mismatches between bathes and the need for preprocessing of data so the methods work. Actually, deep learning models need more data to be trained on compared to conventional methods especially if the data has higher complexity and dimensions. Reference 11 is not correctly referenced as it talks about using matrix factorization and using Bayesian framework to overcome such problems and does not explicitly relay the logic used in this sentence. Regardless, the sentence is not following logic and message of this paragraph.

This sentence: “These methods often fail to fully exploit the latent information within the data, thereby limiting their overall performance.”. No reference. This claim is not accurate and somewhat controversial. The concern is: these methods (if used correctly) are robust as cited in the same paragraph, but preprocessing data to feed to these models require careful fine tunings. Needs to be rewritten, maybe restructuring like: Achieving optimal performance with these methods typically involves multiple stages of preprocessing and fine-tuning of model parameters to fully leverage the rich, latent information embedded in complex MS data.

This sentence:” In recent years, deep learning has emerged as a dominant technology”. Deep learning is not a technology, rather an approach or methodology.

The term channel in this article has been used in title and as a cornerstone of the article. For the first-time reader, the term is confusing, because, in the context of MS data, channel does not have a single universal definition. The authors are using channel as a concept of feature representations from CNN layers. This becomes clear later in the methods section. The term channel needs to be clarified and defined explicitly in the introduction. There is also some mix-up of term channel and concept of dimension. For example, in this sentence: “In the field of image classification, multi-channel images contain complementary information across different channels, enabling a more comprehensive description of the target compared to single-channel representations”. Consider confusion in this sentence: “These multi-channel features not only enrich feature representations but also enhance the model’s adaptability to high-dimensional data.” I suggest definition of term “channel” in the introduction, and then being careful not to use it in interchange with dimension- representation- feature-etc

This sentence does not read well: MS2DeepScore [25] employs a Siamese neural network to learn low-dimensional embeddings of MS vectors for predicting the structural similarity between chemical compounds, which is also applicable to MS Clustering. Maybe a comma is missing after vectors?

In this sentence: Studies have shown that deep learning can directly capture complex patterns”, what is the implication of the word “directly”? What would the indirect way be? I suggest removing the word “directly”

deep classification model” is not a valid term.

In the section where authors outline the contribution of this study:

“End-to-End Training Framework”, training the DL+classifier model in an end-to-end manner is not innovative and is widely used. See: Seddiki, Khawla, et al. "Early diagnosis: End-to-end CNN–LSTM models for mass spectrometry data classification." Analytical Chemistry 95.36 (2023): 13431-13437.,

“Dimensional Adaptation of Multi-Channel Embedding Representations:” This contribution is not clear. It is not clear what authors mean by “adaptability”. “Compared to the original single-channel MS vectors, the embedded multi-channel vectors exhibit better adaptability within CNN architectures.”, how has the adaptability been quantified? What are the statistics of the enhanced adaptability, where is the pvalue?

A general feedback to the authors. Authors have used a fully connected layer as the first layer. It is not wrong and authors’ choice to do so, but in my opinion, it is counter intuitive to use a flat layer to reduce dimensionality. The local MS information is lost this way. When transforming using WX+b, all local dependence is lost (for example peak shape, etc). This is counter-intuitive with using the CNN in the rest of the structure of the model. In most of the cited papers, usually a CNN layers comes first ensuring capturing the local dependencies of peaks, etc. But again, the proposed model by the authors is valid.

E is not a “feature matrix” in the classic sense (i.e., not raw features you designed or extracted). It is more accurate to call it: A latent embedding, or A learned representation

Figure 1: input is BxD but in the figure it is illustrated as 1-D

Addition of channel dimension is poorly shown in the figure 1 making it harder to understand. The green input vector (which says embeding_dim) directly goes into 1D convolution skipping adding a dimension to.

The convolutions are 1D, and the Eprime is Bx1xd, what is “3” in dimension of K1 (1xC/2x3), which dimension the convolution is happening? The same question for K2. It seems the notion and the mapping between the size of K1 and K2 and the output is inconsistent.

“By transforming feature vectors into multi-channel representations, this module enhances adaptability for downstream classification tasks.” this claim needs to be proven in a quantifiable way. It is not clear what does adaptability mean.

The training procedure depicted in figure 2 is in contrast with what has been mentioned as “End-to-End Training Framework”, Figure 2 is showing the backpropagation path back to only classifier module leaving out the MCMCE.

“enabling an adaptive representation of the input.” the term “adaptive” has not been used correctly.

2.6 Data Processing Workflow: This section starts with a few paragraphs as its own introduction. These paragraphs need to be moved to the introduction or discussion depending on the context and in this section only the method should be discussed.

The data processing section leaves most of the details out, referencing [2],[10]. details of the method should be outlined.

Section 2.7- What is the strategy to divide data to training, validation, and test sets? Has k-fold strategy been used? How many times the model has been trained? What is the strategy to avoid overfitting? Are all the results related to a single (but same for all models) trial division? Or the random seed might be different for each tested model?

In the table 2,3 and 5 there are values for precision and F1 that are below 0.5. This does not make sense, because any model that has been trained on data should be better than a totally random model with 50% performance. Also, a value of exactly 0.5000 for recall seems not to be correct considering variability in the MS data.

Table 2 and 3 does not report number of trained models or any variability related to the performance. The absolute value shows enhancement when the MSMCE has been included. However, no significance statistical metric has been reported. The proper way would have been multiple trainings using multiple random divisions of trials and then reporting an average, std and a pvalue for each comparison.

“representing an increase of 1.608%”, this enhancement is not of value, unless repeated over several randomizations and then averaged.

Tables 2, 3,5 and 6 report performance for models LSTM and transformer, without providing details of architecture of each of these models. LSTM layers or transformer architecture could be used in a DL-based pipeline in many ways. Details need to be properly reported.

Table 6 (class12), precision and recall reported for multi-class arrangement. It is not clear if the final precisions the average of precisions for each class or a different strategy has been used.

Figure 4, the reported accuracy drops at some epochs, which makes the reported accuracy metric to be questionless considering optimizers like Adam and loss functions like cross-entropy. How do the authors justify the drops?

ResNet-50, DenseNet-121, and EfficientNet-B0 should be properly referenced.

Figure 5, the lines should be transparent (alpha=0.8) for better visibility. The lines are blocking each other.

Reviewer #3: I would like to begin by congratulating the authors for this very interesting and well-executed piece of work.

This well-written article introduces a novel spectral representation technique that replaces traditional pre-processing methods with a neural network model employing multi-channel embedding.

The model is clearly presented and supported by a well-structured introduction. The limitations of the approach are adequately discussed in both the Materials and Methods section and the Discussion. However, I have several suggestions to enhance the clarity and completeness of the manuscript:

Image Quality: The figures are currently of insufficient quality, making them unreadable. This significantly hinders the reader's ability to follow the results. I strongly recommend improving the resolution and clarity of all figures to ensure they are legible and informative.

Description of the Data: More detailed information about the dataset is necessary. Readers unfamiliar with this data may not understand its specific challenges, particularly in the context of classification tasks. Please elaborate on the nature of the dataset, potential difficulties, and how these may relate to the performance of the proposed representation technique.

Error and Robustness Metrics: While the provided metrics illustrate the model’s performance, the lack of error analysis is a limitation. Including confidence intervals or other measures of uncertainty would provide a clearer picture of the method’s robustness. It would also facilitate a more rigorous comparison between methods.

Comparison with Traditional Methods: The proposed technique is positioned as an improvement over traditional feature engineering methods. However, the manuscript does not clearly specify which traditional techniques were used to generate the “Original” baseline. Please provide more details about these baseline methods so readers can better assess the added value of MSMCE.

Once again, I thank the authors for their contribution, and I also thank the editors for giving me the opportunity to review this manuscript.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #1: No

Reviewer #2: Yes:  Amir Akbarian

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step.

PLoS One. 2025 Aug 6;20(8):e0321239. doi: 10.1371/journal.pone.0321239.r002

Author response to Decision Letter 1


27 May 2025

We have uploaded the Response to Reviewers.docx. All responses are in this file.

Reviewer #1

<COMMENT #1-1> 1. While performance metrics (accuracy, precision, recall, F1) are thoroughly reported, the manuscript lacks statistical testing to validate the significance of the observed improvements. Repeated runs with different random seeds and the use of standard tests (e.g., Wilcoxon signed-rank test, bootstrapping confidence intervals) are recommended. This is particularly important for small-sample settings (e.g., Canine Sarcoma dataset).

<RESPONSE #1-1> Thank you for this important suggestion regarding statistical validation of our results. We have addressed this by incorporating more rigorous statistical analysis in our revised manuscript.

Specifically:

Our experimental design, as detailed in Section 'Experimental Setup and Training Strategy', already employs a k-fold stratified cross-validation (with k=6]) where each model is trained k times on different folds of the training data and evaluated on a common hold-out test set. This provides multiple performance measurements for each model on the test set.

Bootstrap Confidence Intervals: We now report the 95% bootstrap confidence intervals for the mean performance metrics (Accuracy, Precision, Recall, F1-score) obtained from these k-fold evaluations on the hold-out test set. These confidence intervals provide a measure of the uncertainty around our reported mean values and are summarized in the table provided in the Supporting Information (Supporting File S3).

Wilcoxon Signed-Rank Test for Paired Comparisons: To assess the statistical significance of the improvements observed when using our MSMCE module compared to baseline models (i.e., models without MSMCE), we have performed paired Wilcoxon signed-rank tests. These tests compare the performance metrics (e.g., accuracy) of the MSMCE-enhanced model versus its corresponding baseline model across the k test set evaluations from the k-fold cross-validation. The resulting p-values are now reported in the text when discussing these comparisons in the 'Experiment and Results' section (e.g., "MSMCE-ResNet50 showed a statistically significant improvement in accuracy over the baseline ResNet-50 (p = 0.03125)").

While we used a fixed global random seed for reproducibility in the current k-fold cross-validation framework, the use of k-fold itself provides multiple evaluation points. We believe these additions, particularly the inclusion of bootstrap confidence intervals and Wilcoxon signed-rank test results, now provide the necessary statistical validation for the observed performance improvements, especially for datasets like Canine Sarcoma.

<COMMENT #1-2> 2. The MSMCE module's effectiveness is well demonstrated quantitatively, but the biological meaning of the learned features is not explored. Given the biomedical context of mass spectrometry data, the manuscript would benefit from a discussion (or illustrative example) of how MSMCE embeddings relate to biologically meaningful features or clinically relevant patterns. This is important for building trust and enabling adoption in translational research.

<RESPONSE #1-2> We sincerely thank the reviewer for this insightful comment and for highlighting the importance of exploring the biological meaning of the learned features, especially given the biomedical context of mass spectrometry data. We agree that understanding how MSMCE embeddings might relate to biologically significant features or clinically relevant patterns is crucial for building trust and facilitating translational research.

In the current study, our primary objective was to investigate, from a computational and data science perspective, a novel feature representation methodology for raw mass spectrometry data. Specifically, we focused on the development and empirical validation of the MSMCE and its impact on the classification performance and training efficiency of deep learning models. Our research aimed to determine whether transforming single-channel MS data into a multi-channel representation could provide tangible benefits for model training and predictive accuracy, as demonstrated by our quantitative results across four public datasets.

Therefore, while our current work establishes the computational advantages of the MSMCE module, we consider the interpretation of its learned features in a biological or clinical context as an important avenue for future research. We have now explicitly addressed this point in the revised manuscript by expanding the 'Limitations and future directions' subsection within the 'Discussion' section. In this revised section, we clearly state that uncovering the biological significance of the MSMCE embeddings is a key area for subsequent investigation.

We appreciate the reviewer's guidance in emphasizing this aspect, as it will undoubtedly shape our future work in this domain.

<COMMENT #1-3> 3. The manuscript refers to a “residual connection” via channel concatenation, which is misleading, as no additive operation is applied. Please revise the terminology to avoid confusion with standard residual connections.

<RESPONSE #1-3> We sincerely thank the reviewer for this precise and important observation. We acknowledge that our use of the term 'residual connection' in the context of channel concatenation could be misleading, as it does not involve the additive operation characteristic of standard residual connections found in architectures like ResNet. Our intention was to describe an operation that similarly aims to preserve and pass through earlier-stage features (the globally encoded feature E^') alongside newly transformed features (the multi-channel convolutional embeddings C_2).

To address this and ensure clarity, we have revised the manuscript to more accurately describe this operation. Specifically:

We have removed the term 'residual connection' where it might cause confusion with additive skip connections.

We now describe the operation explicitly as 'feature concatenation along the channel dimension to integrate global and local embeddings.'

The purpose of this concatenation—to combine the initial globally encoded representation E^' (with shape B×1×d) with the locally refined multi-channel convolutional features C_2 (with shape B×C×D) to form a richer, combined representation O (with shape B×(1+C)×d)—is now stated more directly without relying on the 'residual' analogy.

These changes have been implemented in the Abstract, Introduction, the 'Materials and Methods' section (specifically the subsection previously titled 'Channel “Residual” Connection', which has been renamed, and in its description), and the Discussion. We believe these revisions accurately reflect our methodology and prevent any potential misunderstanding with standard additive residual connections.

<COMMENT #1-4> 4. Details about hyperparameter selection, learning rate tuning, and batch sizes should be more explicitly stated or summarized in a table.

<RESPONSE #1-4> Thank you for this comment. We have ensured that details regarding hyperparameter selection and learning rate tuning are explicitly stated within the revised 'Experimental Setup and Training Strategy' section of the revised manuscript. Furthermore, the batch sizes used for the experiments are introduced in the preliminary description of the 'Experiment and Results' section (prior to detailing the dataset-specific results, of the revised manuscript). We believe these sections now provide the necessary clarity on these parameters.

<COMMENT #1-5> 5. The methods for computing FLOPs and model size should be clearly described.

<RESPONSE #1-5> Thank you for requesting clarification on the methods used to compute FLOPs and model size. We have added these details to the 'Computational Efficiency Analysis' subsection within the 'Experiment and Results' section of our revised manuscript.

Specifically:

FLOPs (Floating Point Operations): FLOPs were calculated using the thop library. This calculation was performed by providing a single sample tensor with dimensions (1, spectrum_dim) (where spectrum_dim is the length of the binned mass spectrum before MSMCE processing) as input to the profile function. FLOPs are reported in GigaFLOPs (GFLOPs).

Model Size (Number of Parameters): The 'model size' reported in our study refers to the Estimated Total Size (MB) of GPU memory occupied by the model during one forward and one backward pass. This was obtained using the torchinfo.summary function, with an input size corresponding to the batch size used during training (i.e., (batch_size, spectrum_dim)). This metric provides an estimate of the peak memory footprint during training.

<COMMENT #1-6> 6. The t-SNE plots and radar charts in the supplementary material are helpful but not discussed in the main text. Please integrate interpretations of these visualizations into the Results or Discussion to show how MSMCE improves feature separability and efficiency.

<RESPONSE #1-6> Thank you for this valuable suggestion. We agree that discussing these visualizations in the main text will enhance the manuscript.

t-SNE Visualizations: We have now incorporated a discussion of the t-SNE visualizations (provided in the supplementary material, illustrating the data distributions for each dataset) into the preliminary description of the 'Experiment and Results' section. This addition aims to provide an initial qualitative insight into the class separability of the datasets before the application of our models. While these t-SNE plots are based on the original processed data features (prior to MSMCE embeddings) and thus do not directly show how MSMCE improves feature separability, they offer a baseline understanding of the dataset’s characteristics.

Radar Charts: We would like to clarify that the radar charts, which illustrate model efficiency (FLOPs, model size) in relation to accuracy, are already discussed in detail within the 'Computational Efficiency Analysis' subsection of the 'Experiment and Results' section. This discussion explicitly highlights how MSMCE improves computational efficiency.

<COMMENT #1-7> 7. The manuscript is well-written in general. Minor edits are suggested for typographical consistency (e.g., consistent table formatting, figure references). Also, be precise with terms such as "residual connection".

<RESPONSE #1-7> We appreciate the reviewer's positive feedback on the overall writing. We have carefully reviewed the entire manuscript and made edits to ensure typographical consistency, including table formatting and figure references. Furthermore, we have taken care to use terminology precisely. For instance, as per earlier feedback and the reviewer's note, the term analogous to 'residual connection' which is achieved via channel concatenation has been revised to more accurately describe the operation as 'channel concatenation', avoiding any misleading association with standard additive residual connections.

<COMMENT #1-8> 8. The references are not cited in sequential order (e.g., the citation sequence jumps from [3–8] to [10], or from [24–26] to [22,32,39]), which may confuse readers and should be corrected for consistency.

<RESPONSE #1-8> Thank you for pointing out the inconsistencies in our citation order. We have thoroughly reviewed and revised the manuscript to ensure that all references are now cited in strict numerical sequence according to their first appearance in the text, and that the reference list is ordered accordingly. This has been corrected throughout the manuscript for consistency.

<COMMENT #1-9> 9. Although ReLU and Dropout are standard components in deep learning, they should be briefly explained for clarity, given that PLOS ONE targets a multidisciplinary audience that may not be familiar with these concepts.

<RESPONSE #1-9> We thank the reviewer for this thoughtful suggestion. We agree that providing brief explanations for standard deep learning components like ReLU and Dropout is beneficial for enhancing the clarity and accessibility of our manuscript to a multidisciplinary readership, which PLOS ONE serves.

In the revised manuscript, we have incorporated concise definitions and the purposes of both ReLU and Dropout. These explanations have been added to the 'Materials and Methods' section, within the subsection 'Fully Connected Encoder', where these components are first introduced as part of our MSMCE module's architecture. We believe this will help readers who may not be deeply familiar with these specific deep learning techniques to better understand their roles within our proposed model. Specifically, we have clarified that:

ReLU (Rectified Linear Unit) is an activation function that introduces non-linearity into the model, outputting the input directly if it is positive, and zero otherwise, which helps with issues like vanishing gradients and computational efficiency.

Dropout is a regularization technique used during training where a proportion of neuron outputs in a layer are randomly ignored, which helps prevent overfitting and improves the model's generalization to unseen data.

<COMMENT #1-10> 10. Figures 1 to 3 are of very poor quality, with unreadable text. They should be replaced with higher-resolution versions to ensure legibility and clarity.

<RESPONSE #1-10> We apologize for the issues with the figure quality experienced by the reviewer. We believe this may be due to the compression of images when the journal generates the review manuscript. We had initially submitted high-resolution figures. For this resubmission, we have re-checked and ensured that Fig 1 to 3 (and all other figures) are provided in high resolution, meeting PLOS ONE’s image guidelines, to ensure their legibility and clarity. We have also utilized the PACE tool as recommended to verify figure compliance.

Reviewer #2

<COMMENT #2-1> 1. The last sentence of the abstract: “Experimental results ...MS data classification.” The link between reduced computational resource and generalizability of the model is not given and trivial.

<RESPONSE #2-1> We thank the reviewer for pointing out the need for greater clarity regarding the link between reduced computational resources/enhanced training efficiency and model generalizability in the abstract. We agree that this connection is not direct or self-evident and that our original phrasing might have inadvertently implied a direct causal relationship.

Our intention was to highlight that while the primary contributions of MSMCE are significant improvements in classification performance and computational efficiency, the latter (enhanced efficiency) can indirectly facilitate the development of more generalizable models. For instance, reduced training time and lower resource consumption allow for more extensive hyperparameter tuning, cross-validation, and experimentation with different model architectures or larger datasets, all of which are crucial for building models that generalize better to unseen data.

However, to avoid any misinterpretation and to maintain a precise focus on the direct, demonstrated benefits, we have revised the concluding part of the abstract. The revised sentence now emphasizes the demonstrated effectiveness in classification and efficiency, and more cautiously suggests its potential for broader applicability, without making a strong, unproven claim about directly enhancing generalizability due to resource reduction.

The revised sentence in the abstract now reads: ' Experimental results on four public datasets demonstrate that the proposed MSMCE module not only achieves substantial improvements in classification performance but also enhances computational efficiency and training stability, highlighting its effectiveness in raw MS data classification and its potential for robust application across diverse datasets.'

<LOCATION #2-1> Abstract section, final sentence.

<COMMENT #2-2> 2. This sentence in the introduction: “The large ... cancer detection”. The challenges need to be explicitly included. Explicitly including the challenges help to develop motivation of the paper.

<RESPONSE #2-2> We thank the reviewer for this valuable suggestion. We agree that explicitly listing the challenges associated with MS data analysis in the introduction will better establish motivation for our

Attachment

Submitted filename: Response to Reviewers.docx

pone.0321239.s004.docx (78.8KB, docx)

Decision Letter 1

Hirenkumar Mewada

19 Jun 2025

PONE-D-25-08088R1MSMCE: A Novel Representation Module for Classification of Raw Mass Spectrometry DataPLOS ONE

Dear Dr. Xiong,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Aug 03 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Hirenkumar Kantilal Mewada

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: (No Response)

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors have provided a comprehensive and thoughtful revision. All reviewer concerns were addressed appropriately and with substantial improvements to both the clarity and rigor of the manuscript. Key contributions such as the statistical validation of performance improvements, clarification of convolutional architecture and terminology, and enhanced methodological detail, strengthen the manuscript considerably.

Reviewer #2: The reported p-values appear inconsistent and potentially incorrect. All five instances report the exact same value (p = 0.03125), regardless of the magnitude of performance improvement. For example, in Table 5, the accuracy of the Transformer model on the Canine Sarcoma dataset increases markedly from 0.78 ± 0.00 to 0.99 ± 0.01 — a substantial improvement that would typically yield a much lower p-value (e.g., p < 0.01) if statistical significance were properly assessed. The uniformity of the reported p-values across different metrics and datasets raises concerns about the validity of the statistical testing procedure. Clarification is needed on how these p-values were computed and whether appropriate statistical tests were applied in each case.

As noted in the previous round, the performance values reported in Tables 2, 3, and 4 — such as 0.25, 0.33, and 0.4 — are notably lower than what would be expected from a random classifier in a binary classification task. This suggests that the number of model training iterations may have been insufficient to yield a reliable evaluation. The authors report using 6-fold cross-validation (k=6); however, this was applied to a single 90/10 train-test split, meaning the same data partitioning was used throughout the evaluation (i.e., n=1). Given the class imbalance evident in the t-SNE plots provided in the Supplementary Materials, repeating the 90/10 split multiple times is essential to ensure a fair comparison between the baseline and enhanced models. Each repetition can be followed by k-fold cross-validation (potentially with a lower k if computational feasibility is a concern). This approach helps mitigate the impact of any train-test split, smoothing out performance fluctuations caused by randomness in the data. For instance, a low score such as 0.25 from one split may be balanced by higher scores in others, leading to a more representative average.

The p-values in the tables appear to be reported selectively and not consistently across all model comparisons. Specifically, only three p-values are provided: the accuracy of ResNet-50 on the NSCLC dataset, the accuracy of DenseNet-121 on the CRLM dataset, and the F1-score of DenseNet-121 on the RCC dataset. Reporting statistical significance solely for accuracy — without corresponding significance measures for other relevant metrics such as precision or recall — provides an incomplete assessment of model performance. This is particularly concerning in imbalanced classification tasks, where improvements in accuracy may mask critical deficiencies, such as an increase in false positives or false negatives. To ensure scientific rigor and transparency, all key performance metrics reported in the tables — especially those being used to support claims of improvement — should be accompanied by appropriate statistical tests and corresponding p-values. Furthermore, if a reported result is not statistically significant, it should be clearly labeled as such or omitted to avoid misinterpretation.

Line 646: “FLOPs of all these models significantly decrease”, the p-value of the significance, the test name, and the sample size for the test are missing for this claim.

Minor issues:

In the supporting material, the dots on the figures need to be transparent and with a narrow margin for each marker. The points are obstructing each other.

The test for 1 p-value has been reported as signrank test but others are missing the statistical test name. Adding a sentence to clarify that the same test has been used to calculate all significance levels could help clarify. Please also include number of samples used to calculate p value (probably k=6).

Line 542-546, Lines 542–546 — The term “training crash” is not clearly defined or quantified, making it difficult to interpret its impact or relevance. I recommend moving lines 542–554 to the Discussion section, as this content is largely descriptive and does not report any measurable or validated performance metrics. Relocating this section would improve the logical structure of the Results section by keeping it focused on objective findings.

Line 547- Line 547 — The use of the term “excellent” to describe the model’s performance is not appropriate in the context of a scientific report. Descriptive terms such as this are subjective and should be replaced with objective, metric-based statements that accurately reflect the quantitative results.

Line 580, Thie seems to be a typo.

Line 433 Using the term Experimental is not appropriate for this section. Similarly line 489. The model fitting and testing has been performed in silico and no experiment was performed in a lab.

In figure 4, the dashed lines are reporting accuracy over “val”. However, the term “val” needs to be replaced with “test”, as no k-fold validation has been used for the ablation study, and the performance is measured over test set. Similarly, the term “validation performance” needs to be corrected as “performance over the test set”.

Line 644, abbreviation of term FLOP has been repeated twice.

Reviewer #3: I would like to thank the authors for their careful and comprehensive revisions in response to my previous comments. All of the concerns I raised have been thoroughly addressed, and the manuscript has been substantially improved in both clarity and scientific rigor.

I have two additional recommendations that, while not essential, may further enhance the quality and readability of the manuscript:

1. Inclusion of Supplementary Figures (A to C):

The t-SNE visualizations currently provided in the Supplementary Information (Figures A to C) offer valuable insights into the intrinsic structure and difficulty of the classification tasks associated with each dataset. If feasible, I recommend incorporating these figures into the main manuscript to strengthen the presentation of the dataset characteristics and facilitate the reader’s understanding of the challenges involved.

2. Presentation of Bootstrap Confidence Intervals:

Supplementary Table S3, which reports 95% bootstrap confidence intervals for the mean performance metrics, provides important information on the robustness of the results. I encourage the authors to consider including these intervals directly in the main performance tables of the manuscript (e.g., Tables 2 to 6), so that the uncertainty associated with each metric is readily visible to the reader.

These are minor suggestions aimed at improving the accessibility and completeness of the manuscript. I commend the authors for their significant efforts and support the publication of the revised version, with or without the incorporation of these final recommendations.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #1: Yes:  Raquel Cumeras

Reviewer #2: Yes:  Amir Akbarian

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step.

PLoS One. 2025 Aug 6;20(8):e0321239. doi: 10.1371/journal.pone.0321239.r004

Author response to Decision Letter 2


26 Jun 2025

We have uploaded the Response to Reviewers.docx. All responses are in this file.

Thank you for your letter dated June 19, 2025, and for the opportunity to revise our manuscript titled "MSMCE: A Novel Representation Module for Classification of Raw Mass Spectrometry Data" (PONE-D-25-08088). We also extend our sincere gratitude to the reviewers for their insightful comments and constructive suggestions, which have been invaluable in improving the quality and clarity of our paper.

We have carefully considered all the points raised by the Academic Editor and the reviewers. We believe that the revisions made have substantially strengthened the manuscript. We have addressed each comment in a point-by-point manner in the file of "Response to Reviewers", and for ease of review, we have highlighted the revisions in bold. The changes in the revised manuscript have been marked using the "Track Changes" feature in Microsoft Word, as requested. We have also uploaded a clean, unmarked version of the revised manuscript.

We hope that the revised manuscript is now suitable for publication in PLOS ONE.

<COMMENT #2-1> 1. The reported p-values appear inconsistent and potentially incorrect. All five instances report the exact same value (p = 0.03125), regardless of the magnitude of performance improvement. For example, in Table 5, the accuracy of the Transformer model on the Canine Sarcoma dataset increases markedly from 0.78 ± 0.00 to 0.99 ± 0.01 — a substantial improvement that would typically yield a much lower p-value (e.g., p < 0.01) if statistical significance were properly assessed. The uniformity of the reported p-values across different metrics and datasets raises concerns about the validity of the statistical testing procedure. Clarification is needed on how these p-values were computed and whether appropriate statistical tests were applied in each case.

<RESPONSE #2-1> We sincerely thank the reviewer for their meticulous observation and for raising this important question regarding our p-value results. We completely understand why the recurrence of the exact same p-value (p = 0.03125) (We rounded 0.03125 to 0.03 in the current manuscript) across multiple instances would raise valid concerns about its validity. We would like to take this opportunity to clarify our statistical testing procedure. We confirm that all reported p-values were computed using the paired Wilcoxon signed-rank test on the performance scores from each of the 6 folds of our cross-validation (K=6). Therefore, the sample size for each test was N=6.

And most crucially, the reason for this identical p-value appears across instances with different magnitudes of performance improvement is a direct consequence of the properties of the Wilcoxon signed-rank test itself. As a non-parametric test, its calculation relies on the sign and ranks of the paired differences, not on the magnitude of these differences. In every instance where the p-value was reported as 0.03125, it was the case that our proposed MSMCE-enhanced model outperformed its corresponding baseline model in all 6 folds of the cross-validation. For a Wilcoxon signed-rank test with a sample size of N=6, the scenario where all 6 differences share the same sign (e.g., all positive) constitutes the most extreme and consistent positive result that the test can detect. In this "perfect sweep" situation, the test will always yield its minimum possible p-value, regardless of whether the performance improvement in each fold was marginal or substantial. For a two-sided test, this minimum p-value is precisely 0.03125. The detailed description is as follows:

Test Setup and Hypotheses

To clearly elaborate on our p-value calculation process, we detail here the specific steps of the paired Wilcoxon signed-rank test as applied in our case. As stated in the manuscript, we compare the paired performance scores (sample size N=6) of the "models integrated with MSMCE" against their "baseline" counterparts from the 6-fold cross-validation. The null hypothesis (H_0) for this test is that there is no difference in performance between the two models (i.e., the median of the performance differences is zero). Our two-sided test corresponds to the alternative hypothesis (H_a) that a difference in performance does exist (i.e., the median of the differences is not zero).

Calculating the Test Statistic (W)

The calculation begins by taking the difference d for each of the 6 pairs of performance scores. Subsequently, we rank the absolute values of these differences, |d|, from 1 to 6. The original signs (+ or -) are then reassigned to their corresponding ranks, and we separately calculate the sum of the positive ranks (W^+) and the sum of the negative ranks (W^-). Finally, the test statistic W is defined as the minimum of W^+ and W^-. The value of this W statistic reflects the consistency of the difference between the two groups.

Principle of the Two-Sided p-value Calculation

The p-value represents the probability of observing a W statistic as extreme as, or more extreme than (i.e., closer to 0), our calculated W, assuming the null hypothesis (no difference in model performance) is true. For a sample size of N=6, there are 2^6=64 possible combinations of signs for the ranks. Because we are conducting a two-sided test, we consider extreme outcomes in both directions: a very small W (indicating the MSMCE model is consistently better) and a very large W (indicating the baseline model is consistently better). The p-value is the sum of the probabilities of these two extreme scenarios.

Specific Calculation for p = 0.03125

The recurring p-value of 0.03125 in our manuscript corresponds to the most extreme result possible for this test, where the test statistic W=0. This scenario occurs only when one condition is met: the model integrated with MSMCE outperforms the baseline model in every one of the 6 folds. In this case, all 6 performance differences are positive, making the sum of negative ranks (W−) equal to 0, the probability of its occurrence is (1\/2)^6=1\/64, and therefore the test statistic W is also 0m, the probability of its occurrence is (1\/2)^6=1\/64. For this "perfect sweep" situation, the two-sided p-value is calculated as: (Probability of all differences being positive) + (Probability of all differences being negative) = (1/64) + (1/64) = 2/64 = 0.03125. This explains why this fixed, minimum p-value is obtained whenever a consistent improvement is observed, regardless of the magnitude of that improvement.

Therefore, the uniformity of the p-value is not an artifact of an incorrect procedure. On the contrary, it serves as strong evidence that the performance enhancement from our proposed MSMCE module is exceptionally robust and consistent across all data splits in those specific comparisons.

<COMMENT #2-2> 2. As noted in the previous round, the performance values reported in Tables 2, 3, and 4 — such as 0.25, 0.33, and 0.4 — are notably lower than what would be expected from a random classifier in a binary classification task. This suggests that the number of model training iterations may have been insufficient to yield a reliable evaluation. The authors report using 6-fold cross-validation (k=6); however, this was applied to a single 90/10 train-test split, meaning the same data partitioning was used throughout the evaluation (i.e., n=1). Given the class imbalance evident in the t-SNE plots provided in the Supplementary Materials, repeating the 90/10 split multiple times is essential to ensure a fair comparison between the baseline and enhanced models. Each repetition can be followed by k-fold cross-validation (potentially with a lower k if computational feasibility is a concern). This approach helps mitigate the impact of any train-test split, smoothing out performance fluctuations caused by randomness in the data. For instance, a low score such as 0.25 from one split may be balanced by higher scores in others, leading to a more representative average.

<RESPONSE #2-2> We sincerely thank the reviewer for this insightful and important comment. The concern regarding whether the low performance values (e.g., 0.25, 0.33, 0.4) were artifacts of a specific data split is entirely valid. A more robust validation protocol involving multiple independent splits is crucial for ensuring the reliability of our conclusions.

To that end, we have diligently followed your valuable recommendation and conducted a new, more comprehensive set of validation experiments. Specifically:

Multiple Independent Splits: We focused on the baseline Transformer model's performance on the NSCLC, CRLM, and RCC datasets. We repeated the 90/10 train-test split using three different random seeds (3407, 42, and 1234).

Nested Cross-Validation: For each of these independent splits, we subsequently performed a full 6-fold cross-validation on the corresponding training set.

NSCLC

Seed=3407 Fold Accuracy Precision Recall F1 Score

1 0.517062 0.258531 0.5 0.340831

2 0.482938 0.241469 0.5 0.325663

3 0.517062 0.258531 0.5 0.340831

4 0.482938 0.241469 0.5 0.325663

5 0.482938 0.241469 0.5 0.325663

Seed=42 Fold Accuracy Precision Recall F1 Score

1 0.490457 0.245228 0.5 0.329065

2 0.490457 0.245228 0.5 0.329065

3 0.509543 0.254772 0.5 0.337548

4 0.509543 0.254772 0.5 0.337548

5 0.490457 0.245228 0.5 0.329065

Seed=1234 Fold Accuracy Precision Recall F1 Score

1 0.484095 0.242047 0.5 0.326189

2 0.515905 0.257953 0.5 0.340328

3 0.484095 0.242047 0.5 0.326189

4 0.484095 0.242047 0.5 0.326189

5 0.484095 0.242047 0.5 0.326189

CRLM

Seed=3407 Fold Accuracy Precision Recall F1 Score

1 0.49542 0.24771 0.5 0.331292

2 0.49542 0.24771 0.5 0.331292

3 0.50458 0.25229 0.5 0.335362

4 0.50458 0.25229 0.5 0.335362

5 0.49542 0.24771 0.5 0.331292

Seed=42 Fold Accuracy Precision Recall F1 Score

1 0.500694 0.250347 0.5 0.333642

2 0.499306 0.249653 0.5 0.333025

3 0.500694 0.250347 0.5 0.333642

4 0.499306 0.249653 0.5 0.333025

5 0.500694 0.250347 0.5 0.333642

Seed=1234 Fold Accuracy Precision Recall F1 Score

1 0.494588 0.247294 0.5 0.330919

2 0.505412 0.252706 0.5 0.33573

3 0.505412 0.252706 0.5 0.33573

4 0.494588 0.247294 0.5 0.330919

5 0.505412 0.252706 0.5 0.33573

RCC

Seed=3407 Fold Accuracy Precision Recall F1 Score

1 0.314341 0.15717 0.5 0.239162

2 0.685659 0.34283 0.5 0.40676

3 0.685659 0.34283 0.5 0.40676

4 0.685659 0.34283 0.5 0.40676

5 0.685659 0.34283 0.5 0.40676

Seed=42 Fold Accuracy Precision Recall F1 Score

1 0.321383 0.160691 0.5 0.243217

2 0.678617 0.339309 0.5 0.404272

3 0.678617 0.339309 0.5 0.404272

4 0.678617 0.339309 0.5 0.404272

5 0.678617 0.339309 0.5 0.404272

Seed=1234 Fold Accuracy Precision Recall F1 Score

1 0.68694 0.34347 0.5 0.407211

2 0.68694 0.34347 0.5 0.407211

3 0.68694 0.34347 0.5 0.407211

4 0.68694 0.34347 0.5 0.407211

5 0.68694 0.34347 0.5 0.407211

The results from these new experiments (3 independent splits × 6 folds = 18 runs), are highly consistent with our original findings. This strongly indicates that the observed low scores are not an artifact of a specific data partition. Rather, they genuinely reflect the inherent challenge and insufficient representational power of the standard Transformer model when tasked with directly processing such high-dimensional and complex raw mass spectrometry data.

To further elucidate and validate this conclusion, we investigated the underlying reason for this poor performance. We hypothesized that these extremely low scores are due to the model entering a state of "Mode Collapse"—where it ceases to learn and instead predicts a single class for all samples. To provide direct evidence for this, we have now added confusion matrices for these low-performing baseline models to the Manuscript. These confusion matrices clearly visualize the model's extremely skewed predictions, offering intuitive proof that mode collapse has occurred.

In summary, we are very grateful for your suggestion. It has not only pushed us to strengthen the reliability of our study with more rigorous validation, but has also led us to provide a deeper and clearer explanation for the baseline models' performance bottleneck through the inclusion of confusion matrices.

<COMMENT #2-3> 3. The p-values in the tables appear to be reported selectively and not consistently across all model comparisons. Specifically, only three p-values are provided: the accuracy of ResNet-50 on the NSCLC dataset, the accuracy of DenseNet-121 on the CRLM dataset, and the F1-score of DenseNet-121 on the RCC dataset. Reporting statistical significance solely for accuracy — without corresponding significance measures for other relevant metrics such as precision or recall — provides an incomplete assessment of model performance. This is particularly concerning in imbalanced classification tasks, where improvements in accuracy may mask critical deficiencies, such as an increase in false positives or false negatives. To ensure scientific rigor and transparency, all key performance metrics reported in the tables — especially those being used to support claims of improvement — should be accompanied by appropriate statistical tests and corresponding p-values. Furthermore, if a reported result is not statistically significant, it should be clearly labeled as such or omitted to avoid misinterpretation.

<RESPONSE #2-3> We sincerely thank the reviewer for raising this extremely important point. We completely agree that comprehensive and consistent reporting of statistical test results is crucial for ensuring the rigor and transparency of our research, especially when dealing with imbalanced datasets. To systematically address the issues you have raised and to improve the clarity of our manuscript, we have made the following revisions:

Centralized Description of p-value Calculation: We have added a detailed paragraph describing our statistical analysis methodology in the introductory section of the "Results" section. This description clarifies that all p-values were uniformly calculated using a two-sided, paired Wilcoxon signed-rank test (N=6) on the results from our 6-fold cross-validation. This ensures the uniformity and transparency of our statistical methods.

Streamlined and Unified Table Content: To improve the readability and aesthetic presentation of the tables, while focusing on the most representative and comprehensive evaluation metrics, we have streamlined all performance comparison tables (Tables 2-6) in the manuscript. We have removed the display of Precision and Recall, retaining only Accuracy and F1-Score, which we believe are sufficient to reflect the overall performance of the models.

Comprehensive Inclusion of Key Metric p-values: In the streamlined tables, for every Accuracy and F1-Score corresponding to the MSMCE-integrated models, we have now included the p-value from the comparison against the respective baseline model. This ensures that all key performance metrics used to support our claims of improvement are accompanied by their corresponding statistical evidence.

Explanation for Non-significant Results: For any results where the performance improvement was found to be not statistically significant (p ≥ 0.05), we have now included explicit discussions and explanations in the main text of the manuscript. We discuss the potential reasons for this, such as the baseline model already reaching a performance ceiling, thereby avoiding any potential misinterpretation.

We believe that with these revisions—namely, a centralized methodological description, comprehensive p-value reporting for key metrics in tables, and discussion of non-significant results in the text—we have fully and rigorously addressed your concerns. Thank you again for your valuable feedback, which has greatly improved the quality of our manuscript.

<COMMENT #2-4> 4. Line 646: “FLOPs of all these models significantly decrease”, the p-value of the significance, the test name, and the sample size for the test are missing for this claim.

<RESPONSE #2-4> We thank the reviewer for this sharp observation. We would like to clarify that Floating Point Operations (FLOPs) is a deterministic metric, calculated from a fixed model architecture and input size. For any given model, this calculation yields a single,

Attachment

Submitted filename: Response_to_Reviewers_auresp_2.docx

pone.0321239.s005.docx (56.8KB, docx)

Decision Letter 2

Hirenkumar Mewada

16 Jul 2025

MSMCE: A Novel Representation Module for Classification of Raw Mass Spectrometry Data

PONE-D-25-08088R2

Dear Dr. Xiong,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager®  and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Hirenkumar Kantilal Mewada

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: All my previous concerns have been thoroughly resolved. The revised manuscript is significantly strengthened and presents a clear, well-supported, and insightful contribution to the field. I commend the authors for their rigorous approach, and the depth of their analysis. Congratulations on producing such a strong and impactful paper.

Reviewer #3: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #2: Yes:  Amir Akbarian

Reviewer #3: No

**********

Acceptance letter

Hirenkumar Mewada

PONE-D-25-08088R2

PLOS ONE

Dear Dr. Xiong,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Hirenkumar Kantilal Mewada

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File. Estimated model size & FLOPs.

    Tabulated summary of estimated peak GPU memory footprint (MB) and Floating-Point Operations (FLOPs) for baseline and MSMCE-enhanced deep learning models.

    (XLSX)

    pone.0321239.s001.xlsx (17.2KB, xlsx)
    S2 File. Experiments Bootstrap confidence intervals.

    Bootstrap 95% confidence intervals for mean performance metrics (Accuracy, Precision, Recall, F1-Score) from k-fold cross-validation on the hold-out test set for all evaluated models.

    (XLSX)

    pone.0321239.s002.xlsx (20KB, xlsx)
    Attachment

    Submitted filename: Response to Reviewers.docx

    pone.0321239.s004.docx (78.8KB, docx)
    Attachment

    Submitted filename: Response_to_Reviewers_auresp_2.docx

    pone.0321239.s005.docx (56.8KB, docx)

    Data Availability Statement

    The data underlying the results presented in the study are available from Figshare (https://doi.org/10.6084/m9.figshare.29148629.v1).


    Articles from PLOS One are provided here courtesy of PLOS

    RESOURCES