Highlights
-
•
XAI–SERS platform enables accurate discrimination of Escherichia coli pathotypes and Shigella.
-
•
1D-CNN model achieved 97.7% accuracy, surpassing traditional classifiers.
-
•
SHAP analysis identified key spectral features linked to molecular components.
-
•
Provides a precise, interpretable approach for bacterial diagnostics.
Keywords: Explainable AI, XAI, Pathogenic E. coli, Shigella species, Surface-enhanced Raman spectroscopy, Bacterial discrimination
Abstract
Pathogenic Escherichia coli and Shigella species cause severe diarrheal diseases with high mortality but remain difficult to distinguish using conventional methods due to their close genetic and proteomic relatedness. To address this challenge, we propose an explainable artificial intelligence (XAI) with surface-enhanced Raman spectroscopy (SERS) platform for rapid and accurate identification of E. coli pathotypes and Shigella species. We generated 7819 SERS spectra from 294 strains, including 195 representing five pathotypes of E. coli and 99 of Shigella species. This dataset was analyzed within an XAI framework using deep learning models, including a one-dimensional convolutional neural network (1D-CNN) and a multilayer perceptron, and compared with traditional machine learning classifiers. The 1D-CNN achieved 97.7% accuracy, outperforming conventional classifiers. SHapley Additive exPlanations analysis revealed the specific features and molecular components contributing to classification, providing biochemical interpretability. This study demonstrates the potential of XAI–SERS for precise, explainable identification of E. coli pathotypes and Shigella species.
Graphical abstract
1. Introduction
Bacterial pathogens are a primary cause of death globally, responsible for >7.7 million fatalities each year (Ikuta et al., 2022). E. coli (E. coli) is exceptional as a major cause of diarrhea and hemorrhagic colitis among these pathogens, affecting >111 million people and causing almost 63,000 deaths annually worldwide (Havelaar et al., 2015). Pathogenic E. coli can be classified into the following five distinct pathotypes: enterotoxigenic E. coli (ETEC), Shiga toxin–producing E. coli (STEC), enteropathogenic E. coli (EPEC), enteroaggregative E. coli (EAEC), and enteroinvasive E. coli (EIEC) (Gomes et al., 2016). These bacteria use a diverse array of virulence mechanisms, including toxin production, epithelial adhesion, and host cell invasion, to establish colonization, disrupt intestinal barrier function, and elicit severe clinical manifestations in the host (Peng et al., 2024).
The highly contagious and pathogenic bacterium E. coli is closely related to Shigella, which accounts for approximately 188 million cases of severe dysentery worldwide each year (Kotloff et al., 2018). Shigella infections are the second-most common cause of diarrhea-related mortality worldwide, resulting in an estimated mortality of 55,000 deaths annually among children aged <5 years (Troeger et al., 2018). S. sonnei, S. flexneri, S. boydii, and S. dysenteriae are notorious for causing dysentery, with a remarkably low infectious dose of only 10 to 100 organisms being sufficient to produce disease manifestations (Bennish and Ahmed, 2020; Zaidi and Estrada-García, 2014). These E. coli pathotypes and Shigella species possess virulence markers, which contribute to their clinical pathways for disease induction in the host, as depicted in their genomic organization (Croxen et al., 2013; Kaper et al., 2004; Pakbin et al., 2021) (Fig. 1).
Fig. 1.
Circular genome map of the Shigella flexneri 2457T, Shigella boydii 602,144, Shigella dysenteriae ATCC 13,313, Shigella sonnei ATCC 25,931, E. coli (EAEC) NCCP 14,039, E. coli (EAEC) E1–34, E. coli (STEC) NCCP 15,655, E. coli (ETEC) NCCP 15,740, and E. coli (EPEC) E2348/69 highlighting pathogenic gene distributions, visualized using the CGView tool.
Historically, Shigella was a part of the diverse E. coli species, and in the 1940s, it was classified as a separate genus owing to its distinct clinical significance (Lan and Reeves, 2002). Nevertheless, subsequent molecular studies revealed extensive phenotypic and genotypic similarities between Shigella and E. coli that some researchers propose Shigella as part of the Escherichia genus (Halimeh et al., 2021; Zuo et al., 2013). Despite this close genetic relatedness, accurate differentiation between E. coli pathotypes and Shigella species remains clinically important. Although infections caused by these organisms often present with similar gastrointestinal symptoms, their clinical management and public health responses can differ. Shigella infections are typically associated with severe dysentery and are subject to strict surveillance owing to their extremely low infectious dose and high transmissibility (Bennish and Ahmed, 2020). In contrast, certain E. coli pathotypes such as STEC require different therapeutic considerations because antibiotic treatment may increase the risk of hemolytic uremic syndrome, whereas antibiotic therapy is generally recommended for shigellosis to shorten disease duration and reduce transmission (Freedman et al. 2016; Kotloff et al., 2018). Consequently, misidentification between these closely related organisms may lead to inappropriate treatment decisions and delayed infection control.
Traditional bacterial classification methods, such as 16S rRNA sequencing, DNA-based techniques, and matrix-assisted laser desorption/ionization time of flight (MALDI-TOF) mass spectrometry, often fail to adequately differentiate between these organisms (Kotłowski et al., 2020; Martiny et al., 2012; M. van den Beld and F. Reubsaet, 2012; van den Beld et al., 2022; Zimmermann et al., 2020). Such limitations can result in considerable diagnostic delays, occasional misidentifications, and inappropriate antibiotic prescriptions, all of which pose substantial public health challenges (Fleming-Dutra et al., 2016; Shen et al. 2025). Consequently, more accurate and rapid methods are urgently needed to distinguish these pathogens and improve clinical outcomes and disease control measures (Thrift et al. 2020).
Surface-enhanced Raman spectroscopy (SERS) is based on the principle that when photons interact with vibrating molecules, inelastic scattering occurs, which produces a unique vibrational Raman spectrum, which is remarkably amplified near surfaces of metals such as gold and silver (Aydin et al., 2009; Tang et al., 2021). SERS has been extensively researched in the medical and bacterial fields for its rapid, noninvasive, and cost-effective technique (Galvan and Yu, 2018; Kahraman and Wachsmann-Hogiu, 2015; Kim et al., 2025; Bi et al. 2023; Huang et al. 2025). SERS can be categorized into label-based and label-free modalities. Label-based approaches employ specific molecular tags or probes to enhance detection selectivity and sensitivity toward predefined targets (Bi et al. 2020). Nevertheless, label-free SERS has gained increasing prominence because it requires minimal sample preparation and enables direct capture of intrinsic molecular features, allowing simpler analysis and real-time monitoring without the need for chemical labels or probes (Lee et al., 2024). Nevertheless, few spectral differences are observed at the serotype or subspecies level of bacteria, making it challenging to distinguish and classify these fine discrepancies accurately (Liu et al., 2022). Furthermore, SERS faces several challenges, including poor reproducibility of signals and difficulties in ensuring compatibility of spectral datasets across different hardware platforms (Liu et al., 2021a). Nonetheless, SERS remains a valuable tool for bacterial discrimination because it can capture subtle molecular signals originating from peptides, lipids, and metabolites.
Several studies have recently reported the successful application of advanced data analysis to SERS for discriminating between bacterial species (Ciloglu et al., 2021; Jeon et al., 2025; Sun et al., 2023). Sun et al. used a convolutional neural network (CNN) for classifying Salmonella serotypes based on data obtained from Raman spectroscopy (Sun et al., 2023). Ciloglu et al. demonstrated the ability of a deep neural network to differentiate between methicillin-resistant Staphylococcus aureus (S. aureus) and methicillin-sensitive S. aureus using a label-free SERS technique with an accuracy of 97.66% (Ciloglu et al., 2021). Furthermore, Jeon et al. developed a machine learning–integrated label-free SERS platform that, by incorporating data preprocessing techniques, achieved 100% accuracy in classifying four bacterial species (Jeon et al., 2025). Bi et al. developed a paper-based SERS chip integrated with a multi-branch adaptive attention convolutional neural network (MBAA-CNN), achieving 98.6% accuracy in pathogen discrimination and 99.5% accuracy in antibiotic resistance classification, demonstrating the potential of deep learning–assisted SERS platforms for rapid bacterial identification (Bi et al., 2025). Building on these data, we hypothesize that SERS can detect subtle structural differences between E. coli pathotypes and Shigella spp., with advanced deep learning analyses anticipated to improve their classification.
Moreover, advances in interpreting the decision-making processes of AI models have resulted in the emergence of explainable artificial intelligence (XAI), providing greater transparency and trust (Saranya and Subhashini, 2023; Sadeghi et al., 2024). Unlike conventional black-box AI, XAI achieves high accuracy and also reveals the features driving its conclusions, thus enabling interpretability, reliability, and informed decision-making (Buyuktepe et al., 2025). Therefore, we hypothesized that incorporating XAI into SERS would enable the interpretation of spectral features associated with underlying biological components, thereby providing molecular-level insights into the discrimination process.
In this study, we aimed to develop an XAI–SERS platform for accurately differentiating between E. coli pathotypes and Shigella species. We used Au@Ag core-shell nanoparticles (NPs) for collecting SERS spectral data and implemented preprocessing to ensure consistent spectra and minimize experimental errors. We next leveraged deep learning classifiers, including a one-dimensional CNN (1D-CNN) and a multilayer perceptron (MLP), to classify the SERS spectra of the E. coli pathotype and Shigella species and evaluated their performance. In addition, we applied SHapley Additive exPlanations (SHAP) analysis to identify the molecular components that played a vital role in classifying bacterial classes. Collectively, this XAI–SERS framework enables accurate and biologically interpretable discrimination of E. coli pathotypes and Shigella species.
2. Materials and methods
2.1. Bacterial sample preparation
We examined 294 bacterial strains, STEC (57 strains), ETEC (38 strains), EPEC (56 strains), EAEC (30 strains), and EIEC (12 strains) of E. coli pathotypes and S sonnei (54 strains), S flexneri (29 strains), S boydii (11 strains), and S dysenteriae (5 strains) of Shigella species. Reference strains were obtained from the National Culture Collection for Pathogens (Cheongju, Korea), the American Type Culture Collection (Manassas, VA, USA), the Korea Biobank Network (Yongin, Korea), and the Korean Collection for Type Cultures (Jeongeup, Korea) (Table S1). The pathotypes of E. coli and Shigella species were confirmed using biochemical tests conducted by the respective source institutions. Bacterial isolates were collected from clinical samples, food sources (beef, pork, and kimchi), and water. To confirm the pathotypes of E. coli and key virulence gene markers, all isolates were subjected to real-time polymerase chain reaction (Table S2).
Before the experiment, all strains were stored at −80 °C in glycerol stocks and subsequently revived by inoculating onto tryptic soy agar (TSA, MBcell, Seoul, South Korea) plates. Each strain was then inoculated into tryptic soy broth (TSB, MBcell, Seoul, South Korea) and incubated for 16 h at 37 °C, after which the cultures were centrifuged at 4500 rpm for 5 min. The resulting supernatant was discarded, and the cell pellets were washed in distilled water to remove media components. Optical density was measured at 600 nm (OD600) to confirm a bacterial concentration of 108 CFU/mL, suitable for subsequent analysis. bacterial population was performed using the cell counting method (Gómez-Rojo et al., 2015). All bacterial handling and sample preparation procedures were conducted under biosafety level 2 containment.
2.2. SERS measurements
SERS measurements were conducted using Au@Ag core-shell nanoparticles according to a previously established research (Jeon et al., 2025). The Au@Ag nanoparticle solution (21 μg/mL) and bacterial suspension were mixed at a 2:1 (v/v) ratio and incubated for 30 min at 37 °C. Then, 10 μL of the resulting mixture was applied to silicon wafers and dried at room temperature. SERS spectra were acquired using a QE-PRO spectrometer (Ocean Optics Inc.) equipped with a 785-nm laser at 22 mW. Spectral collection was standardized with an acquisition time of 10 s per measurement. A total of 7819 spectra were collected from multiple randomly selected spots across sample droplets, with 800–1000 spectra obtained for each bacterial class (Table S1–3). Spectral acquisition was performed over multiple experimental days, and independently synthesized Au@Ag core–shell nanoparticle batches were used for each measurement.
2.3. Data preprocessing
SERS spectra were reduced to the wavenumber range of 600 to 1600 cm−1 and then processed through the following four primary steps: (1) baseline correction; (2) despiking; (3) data binning; (4) standardization. Baseline correction was performed using the Ideal Modified Polynomial Fitting (IModPoly) method using a polynomial order of 11. This method refines polynomial coefficients iteratively to subtract the underlying fluorescence and noise contributions, while retaining the spectral features required for subsequent analyses (Zhao et al., 2007).
We next applied a despiking procedure to detect and correct noise spikes in the Raman spectra (Whitaker and Hayes, 2018). Initially, modified Z-scores were calculated from the once-differentiated spectrum to highlight spikes using a threshold set at 5 to distinguish true spikes from normal variations. Next, the identified spikes were corrected through a moving average of adjacent nonspike points, resulting in smoothed spectral profiles.
The P-Bin method was used to address peak shifts induced by experimental variability, identifying prominent spectral peaks and aligning them within standardized bins (Chai et al., 2023). Peak-picking algorithms were used to identify prominent spectral peaks, which were then aligned to a master list within a defined tolerance. Bin widths were set to approximately half the full width at half maximum of a reference peak.
The spectra were standardized using the StandardScaler class from the scikit-learn library. This standardization method transforms each feature to zero mean and unit variance, instead of scaling to a range between 0 and 1. It standardizes feature values by eliminating the mean and adjusting it to unit variance.
To avoid potential information leakage, all preprocessing steps that involve learned parameters were applied using training data only. In particular, the StandardScaler was fitted exclusively on the training dataset, and the resulting mean and variance parameters were subsequently applied to the validation and test datasets. The other preprocessing steps, including baseline correction, despiking, and peak binning (P-Bin), were applied independently to each spectrum and do not rely on statistics computed from the full dataset. No global or master peak list was constructed using the combined dataset, ensuring that no information from the validation or test sets was introduced during preprocessing.
The complete preprocessing and model training pipeline is available in the public repository (https://github.com/youzh-all/XAI-SERS), ensuring transparency and reproducibility of the implemented procedures.
2.4. Deep learning analysis of SERS spectra
The SERS spectra were classified using two advanced deep learning models viz., the 1D-CNN and MLP. The 1D-CNN features a convolutional layer with 32 filters, a kernel size of 7, and a rectified linear Unit (ReLU) activation function, followed by a max pooling layer with a pool size of 2. This model concludes with one dense layer containing 150 neurons and uses the Adam optimizer with a learning rate of 0.0009. Early stopping was applied to improve generalization, with training limited to 50 epochs. Detailed training configurations, including the train/validation/test split ratio, batch size, random seeds, number of runs, and early stopping settings, are summarized in Table S4. The MLP architecture consists of an input layer, two hidden layers with 100 and 50 neurons each, and an output layer, with a softmax activation function, using the MLPClassifier from the scikit-learn library. The model incorporates ReLU for nonlinearity and is trained on scaled data for up to 1000 maximum iterations, with early stopping to prevent overfitting. To compare the performance of these deep learning models with that of traditional classifiers, SVM and RF were also employed. A comprehensive evaluation of all models was conducted using various performance metrics, including accuracy, precision, recall, and F1 scores, and the best-performing model was further examined using learning and loss curves and a confusion matrix to provide a detailed examination of its discriminative capabilities.
To further assess the robustness of the model evaluation and to address potential spectrum-level data leakage, an additional strain-wise grouped evaluation was performed. Because multiple Raman spectra were collected from each bacterial strain, random spectrum-level splitting may result in spectra from the same strain appearing in both training and test datasets.
To avoid this issue, spectra derived from the same bacterial strain were grouped together and assigned exclusively to a single dataset partition (training, validation, or test). In addition, a strain-wise holdout test set was constructed by reserving a subset of bacterial strains as independent test strains, while the remaining strains were used for model training and validation. The detailed strain-wise dataset partition is provided in Table S5.
2.5. Feature importance analysis
To further clarify the mechanisms underlying the classification accuracy of our best-performing deep learning model, we conducted a feature importance analysis using SHAP algorithms (Lundberg and Lee, 2017). The SHAP explainer, designed to handle time-series or sequence data such as spectra, was used to interpret the predictions of the model. The SHAP explainer was initialized with the training data and configured to calculate the contribution of each spectral feature to the model’s output. The explanatory model was applied to the test set, and the impact of each feature on the prediction of the model was. The average importance of each feature across multiple samples was evaluated to ensure the robustness of the results. The output from the SHAP analysis provided a list of features ranked according to their importance, and the molecular components corresponding to the characteristic SERS peaks were matched.
A fold-based sensitivity analysis of SHAP rankings was also conducted to assess the robustness of the SHAP-based feature importance. SHAP importance values were computed independently for three fold-specific evaluation subsets, and the resulting feature rankings were compared pairwise. Stability of the rankings was quantified using Spearman rank correlation across all wavenumbers and Jaccard overlap for top-k feature sets (k = 10, 20, 30, 50).
3. Results and discussion
3.1. Characterization of SERS spectra for pathogenic E. coli and shigella species
A variety of SERS substrates and NPs are applied; however, metal colloids are particularly preferred in bacterial research due to their extensive ability to interact with bacteria, markedly improving hotspot formation (Kahraman et al., 2011; Huang et al. 2023). Among these, gold (AuNPs) and silver nanoparticles (AgNPs) are frequently used in detecting bacteria because of their superior signal-to-noise ratio characteristics. AgNPs exhibit strong plasmonic properties but have lower stability, whereas AuNPs exhibit high stability (Damm et al., 2011; Stokes et al., 2007). Therefore, in the present study, we used an Au@Ag core-shell structure to exploit the interactions between these NPs, improving signal amplification and stability. The TEM image and UV–vis absorption spectrum (Figure S1A-B) confirmed the successful synthesis and optical properties of the Au@Ag core-shell NPs. The absorption peaks at 497 and 390 nm were attributed to the Au core and Ag shell, respectively (Samal et al., 2013). In addition, the SEM image and comparative Raman/SERS spectra (Figure S1C-D) supported the interaction between Au@Ag NPs and bacteria, which resulted in enhanced signal generation.
Additional optimization experiments were conducted to establish the optimal substrate and workflow conditions for SERS signal acquisition. Au@Ag core-shell nanoparticles produced stronger SERS signals than Au nanoparticles alone (Figure S2A-B), while signal intensity was maximized at 30 min of incubation and an Au@Ag-to-bacteria mixing ratio of 2:1 (v/v) (Figure S2C-F). These conditions were therefore used for subsequent analyses.
Because of the distance dependence of SERS enhancement mechanisms, it has generally been assumed that the vibrational bands observed in the SERS spectra of bacterial cells are dominated by contributions from structural features and metabolites (Kovacs et al., 1986; Premasiri et al., 2016; Samek et al., 2021). Therefore, the prevailing view is that the SERS spectra of bacteria exhibit a diverse array of vibrational modes, originating from cell wall components such as peptidoglycan, lipids, membrane proteins, and nucleic acids and from various metabolites (Gonzalez-Gonzalez et al., 2022; Li et al., 2020; Liu et al., 2021a; Liu et al., 2017; Walter et al., 2011). Although E. coli pathotypes and Shigella spp. share numerous virulence factors and pathogenic mechanisms, they exhibit distinct phenotypic differences in motility, metabolic activity, virulence, and the manner in which they induce clinical pathology in the host (Hendriks et al., 2020; Pasqua et al., 2017). Hence, some structural differences between E. coli pathotypes and Shigella spp. are anticipated, and it is postulated that these differences will be reflected in the SERS signal.
Fig. 2 depicts the SERS spectra of pathogenic E. coli and Shigella species, which exhibit characteristic vibrational peaks at 653 cm−1 (C—C twisting in tyrosine), 734 cm−1 (adenine, DNA), 955 cm−1 (υ(CN)), 1003 cm−1 (phenylalanine ring breathing), 1030 cm−1 (C—H in-plane phenylalanine), 1131 cm−1 (υ(COC) ring breathing), 1331 cm−1 (υ(NH2) adenine, polyadenine), 1376 cm−1 (υ(COO-)), 1461 cm−1 (CH2 deformation lipids), and 1586 cm−1 (amide II, tyrosine, adenine, guanine) (Li et al., 2020; Liu et al., 2017; Walter et al., 2011). However, the peak at 786 cm−1 (cytosine, uracil (ring stretching)) is shared by all E. coli pathotypes and Shigella spp., except for STEC, and the peak at 877 cm−1 (C–O–C stretching/Trp) is shared by all species, except S. dysenteriae (Gonzalez-Gonzalez et al., 2022). Furthermore, the peak at 855 cm−1 (ring breathing Tyr protein) is shared by all Shigella spp., except S. boydii, and the peak at 1209 cm−1 (Tyr-Phe) is shared by all species, except S. dysenteriae (Liu et al., 2021b). Detailed information on these characteristic peaks from the SERS spectra is provided in Table 1. Although E. coli pathotypes and Shigella spp. display some distinct spectral features in their SERS profiles, the subtle differences make it challenging to reliably distinguish these bacterial types.
Fig. 2.
The SERS spectra of pathogenic E. coli and Shigella spp. The solid line represents the average intensity of the SERS spectrum, whereas the shaded area represents one standard deviation. The number of spectra collected for each bacterial type is as follows: S. flexneri (n = 881), S. boydii (n = 952), S. dysenteriae (n = 910), S. sonnei (n = 881), EAEC (n = 966), EIEC (n = 767), EPEC (n = 881), ETEC (n = 671), and STEC (n = 910).
Table 1.
Distinct peaks identified in the average SERS spectra of pathogenic E. coli and Shigella spp. and their corresponding band assignments.
| Raman Shift (cm−1) | Band Assignment | S. s | S. f | S. b | S. d | STEC | EPEC | ETEC | EAEC | EIEC | Ref |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 653 | C-C twisting in tyrosine | (Li et al., 2020) | |||||||||
| 734 | Adenine, DNA | (Walter et al., 2011) | |||||||||
| 786 | Cytosine, Uracil (ring stretching) | (Gonzalez-Gonzalez et al., 2022) | |||||||||
| 855 | Ring breathing Tyr protein | (S. Liu et al., 2021) | |||||||||
| 877 | C–O–C stretching, Trp | (Gonzalez-Gonzalez et al., 2022) | |||||||||
| 955 | υ (CN) | (Liu et al., 2017) | |||||||||
| 1003 | Phenylalanine ring breathing | (Li et al., 2020) | |||||||||
| 1030 | C-H in-plane phenylalanine | (Li et al., 2020) | |||||||||
| 1093 | δ (CC, –COH) of carbohydrates | (Walter et al., 2011) | |||||||||
| 1131 | υ (COC) ring breathing | (Liu et al., 2017) | |||||||||
| 1209 | Tyr-Phe | (Gonzalez-Gonzalez et al., 2022) | |||||||||
| 1245 | Amide I | (Liu et al., 2017) | |||||||||
| 1272 | Amide III | (S. Liu et al., 2021) | |||||||||
| 1331 | υ (NH2) adenine, polyadenine | (Walter et al., 2011) | |||||||||
| 1376 | υ (COO-) | (Walter et al., 2011) | |||||||||
| 1461 | CH2 deformation lipids | (S. Liu et al., 2021) | |||||||||
| 1544 | Amide II, υ (CN), γ (NH) | (Liu et al., 2017) | |||||||||
| 1586 | Amide II, tyrosine, adenine | (Li et al., 2020) |
3.2. Preprocessing techniques for enhanced SERS spectral reliability and reproducibility
We acquired 7819 SERS spectra, during which we detected instances of spiking in some of the data. These spikes, characterized by narrow positive peaks at random positions, are caused by high-energy particles, predominantly muons, striking the charge-coupled device in Raman systems (Groom, 2002). The presence of these spikes can markedly hinder analysis, introducing erroneous variables in multivariate curve resolution or regression techniques and potentially causing misclassification (Zhang and Henson, 2007). To address this issue, we used a despiking algorithm based on the modified Z-score method for the detection and adjustment of outliers, following baseline correction (Whitaker and Hayes, 2018). This methodology remarkably improved the reliability and accuracy of our analysis, thereby validating the efficacy of our approach in the analysis of large-scale SERS spectral data.
Moreover, it is essential to address the spectral shifts caused by equipment variability, wavelength differences, or changes in bacterial metabolites for ensuring reproducible and universally applicable results (Davis et al., 2007). For instance, peaks recorded on benchtop instruments may shift slightly when measured on portable devices. Recognizing and then correcting these shifts is critical for supporting robustness against experimental spectral variability (Gopal and Muthu, 2024). To further examine batch-related spectral variability, additional validation experiments were conducted on two separate days using independently synthesized Au@Ag nanoparticle batches and the same bacterial strain. Comparison of representative Raman bands showed that several major peaks remained identical across batches, whereas the remaining peaks exhibited minor positional differences of approximately 2.8–3.4 cm⁻¹ (Table S6). To mitigate these issues, we adopted the P-Bin technique that constructs a master peak list from the dataset to automatically identify peaks and correct shifts, thereby improving compatibility across instruments and enhancing the robustness of the analysis (Chai et al., 2023). This process groups similar peaks at identical wavelengths, ensuring consistency and reliability across different devices and conditions. This result supports the robustness of the P-Bin preprocessing against batch-dependent spectral variation. Finally, the spectra were standardized to ensure that each feature has zero mean and unit variance, thereby facilitating effective deep learning model training.
3.3. Performance evaluation of deep learning classifiers for SERS spectral analysis
Discrimination based solely on the relative intensity of peak values and visual inspection cannot provide definitive criteria for distinguishing each type of bacteria. We used t-distributed stochastic neighbor embedding (t-SNE) for demonstrating variations in SERS spectral among bacterial types. The algorithm transforms the high-dimensional SERS spectral data into a two-dimensional space, enabling the visualization of intergroup relationships (Tseng et al., 2023). As depicted in Fig. 3, t-SNE visualizes clusters corresponding to each bacterial type. However, considerable overlaps between clusters indicate that the spectral similarities are too subtle for definitive discrimination. Hence, robust classification algorithms are essential for analyzing such spectra.
Fig. 3.
Visualization of SERS spectral clustering of E. coli pathotypes and Shigella spp. using t-SNE. Each point corresponds to the SERS spectrum of an individual bacterium, with clusters representing distinct bacterial types. Overlapping regions reflect spectral similarities, underscoring the need for advanced classification algorithms to achieve accurate differentiation.
Several studies have demonstrated that CNNs outperform traditional classification algorithms in identifying and categorizing SERS spectra (Liu et al., 2022; Tang et al., 2022, 2021; Wang et al., 2022). Therefore, we used a 1D-CNN, which was specifically designed for effectively handling time-series data by allowing its kernel to move along the feature axis and capture inter-variable local correlations (Zhang et al., 2023). We also used classical deep learning techniques, such as MLP, along with traditional machine learning models, including support vector machine (SVM) and random forest (RF), to enable a comprehensive comparison of performance. These comparative analyses are particularly vital considering that microbial SERS spectra often display high dimensionality and subtle nonlinear patterns. Deep learning models, especially 1D-CNNs, are well-suited for capturing such complex and localized features, thereby providing a performance advantage over use of traditional approaches. Our findings validate this advantage, demonstrating the superior effectiveness of deep learning in microbial spectral analysis.
The results, as shown in Table 2, demonstrate that the 1D-CNN model outperforms traditional classifiers such as MLP, RF, and SVM, exhibiting the highest accuracy (97.70%), precision (97.72%), recall (97.70%), and F1 score (weighted: 97.70%; macro: 97.64%). This superiority is attributed to the ability of 1D-CNN to efficiently process continuous spectral data, thereby enabling the detection of critical temporal patterns that are essential for the analysis of complex SERS spectral data (Hamed Mozaffari and Tay, 2022). Although MLP demonstrated relatively good performance with an accuracy of 93.99%, RF and SVM were less effective, displaying accuracies of 90.60% and 83.38%, respectively. These results suggest that traditional classifiers struggle with the complexities of datasets such as those found in microbial SERS spectra (Wang et al., 2020).
Table 2.
Comparative performance analysis of classification algorithms for SERS spectral data.
| Models | Accuracy | Precision | Recall | F1_weighted | F1_marcro |
|---|---|---|---|---|---|
| 1D-CNN | 97.70% | 97.72% | 97.70% | 97.70% | 97.64% |
| MLP | 93.99% | 94.04% | 93.99% | 93.98% | 93.96% |
| RF | 90.60% | 90.87% | 90.60% | 90.64% | 90.55% |
| SVM | 83.38% | 83.74% | 83.38% | 83.40% | 83.31 |
To further evaluate the robustness and generalization capability of the model, an additional strain-wise holdout evaluation was performed. In this evaluation setting, spectra derived from the same bacterial strain were grouped together and assigned exclusively to a single dataset partition, and a subset of bacterial strains was reserved as independent test strains.
The classification performance under this strain-wise evaluation protocol is summarized in Table S7. The model achieved an accuracy of 0.8738 on an independent test set consisting of 800 spectra from 18 previously unseen bacterial strains. Although the performance decreased compared with the spectrum-level random splitting used in the primary analysis, the model maintained stable classification performance on spectra obtained from previously unseen strains.
To effectively determine the performance of a classifier, it is crucial to use a range of metrics beyond mere accuracy. The learning curve and loss curve for the 1D-CNN model (Fig. 4A) provide a comprehensive overview of the training and validation performance of the model across epochs. As training progresses, it is evident that the accuracy of the model on the training and validation sets approaches 100% and subsequently stabilizes upon reaching the 30th epoch. This indicates a strong fit of the model to the training data without much overfitting. The confusion matrix for the 1D-CNN model (Fig. 4B) showed that this classifier achieves high identification accuracy when distinguishing between E. coli pathotypes and Shigella spp. Remarkably, it achieves 98.95% accuracy even in traditionally indistinguishable cases such as EIEC and Shigella spp., which are closely related genetically and share overlapping pathogenic mechanisms (van den Beld and F. A. Reubsaet, 2012). The t-SNE visualization of features extracted from the dense layer of the 1D-CNN (Fig. 4C) showed distinct and well-separated clusters corresponding to each bacterial class, in sharp contrast to the raw spectral data (Fig. 3), which display considerable overlap. This emphasizes the ability of the 1D-CNN model to learn meaningful representations that capture the subtle differences between classes, which are less distinguishable in the original data due to overlapping spectral features.
Fig. 4.
Performance metrics and feature visualization of the 1D-CNN model for SERS spectral classification. (A) Learning and loss curves of the 1D-CNN model during training and validation phases. (B) Confusion matrix of the 1D-CNN model showing classification results for E. coli pathotypes and Shigella spp. (C) t-SNE visualization of dense layer features extracted by the 1D-CNN model.
3.4. Interpreting deep learning models: feature importance in 1D-CNN-based classification
The XAI framework was implemented using the SHAP explainer to improve the interpretability of the 1D-CNN model applied to SERS spectral data. SHAP evaluates the contribution of each spectral feature at different wavelengths to the model predictions, thereby estimating the relative importance of each wavelength (Zhong et al., 2024), thus enabling the identification of key spectral peaks in the SERS data and the analysis of how specific wavelengths affect the predictive outcomes of the model.
The SHAP-based analysis of feature importance (Fig. 5) emphasized that the vibrational bands associated with aromatic amino acids, particularly phenylalanine- and tyrosine-related peaks, emerged as the dominant contributors to model predictions. These residues are abundant in several virulence-associated proteins and toxins (DeVinney et al., 2001), and their distinct vibrational signatures probably reflect variations in the protein composition and structural environments across E. coli pathotypes and Shigella species. In addition to these dominant aromatic amino acid signals, several auxiliary vibrational modes provided secondary contributions to classification. The C–N stretching (u(CN)) band reflects differences in peptide bond environments within proteins (Kotobi et al., 2023), and the C–O–C ring breathing (u(COC)) mode is primarily associated with polysaccharides and glycoproteins, indicating variations in cell wall and outer membrane structures such as lipopolysaccharides (Chylińska et al., 2014). Moreover, the amide I/II regions originate from peptide backbone vibrations and are indicative of polypeptide secondary structures (Kotobi et al., 2023). Altogether, these data suggest that while aromatic amino acid residues represent the primary molecular determinants for distinguishing E. coli pathotypes from Shigella species, auxiliary features such as u(CN), u(COC), and amide I/II bands provide complementary discriminatory power by reflecting the differences in protein backbone conformations, polysaccharide composition, and overall metabolic signatures.
Fig. 5.
Feature importance of molecular components contributing to 1D-CNN classification of E. coli pathotypes and Shigella species based on SERS spectra: (A) Shigella flexneri; (B) Shigella boydii; (C) Shigella dysenteriae; (D) Shigella sonnei; (E) EAEC; (F) EIEC; (G) STEC; (H) ETEC; (I) EPEC. Each bar represents a molecular component matched to a vibrational band, with its length reflecting its relative contribution to the model's predictive performance.
Importantly, the differential weighting of these peaks across bacterial classes indicates pathotype-specific virulence and metabolic adaptations. The increased contribution of cytosine- and uracil-associated peaks in S. flexneri, S. sonnei, S. boydii, S. dysenteriae, and EIEC suggests differences in nucleic acid environments, potentially influenced by plasmid-encoded invasion factors (Haidar-Ahmad et al., 2023). Consistent with this observation, Shigella spp. and EIEC also showed stronger contributions from aromatic amino acid– and cell envelope–related bands, suggesting a molecular organization associated with intracellular invasion and host–cell interaction.
In contrast, the comparatively lower contribution of these nucleic acid signals in EAEC, EPEC, ETEC, and STEC is consistent with their dependence on protein-based virulence mechanisms such as secreted toxins and adhesins (Cepeda-Molero et al., 2017). Correspondingly, EAEC, EPEC, ETEC, and STEC were more strongly characterized by protein-related vibrational signatures, including amide and C–N bands, consistent with pathogenic strategies centered on extracellular adhesion, toxin delivery, and colonization rather than epithelial invasion. Together, these findings suggest that invasive pathotypes are more strongly characterized by nucleic acid–related features, whereas toxin- and adhesion-based pathotypes are better captured by protein-associated vibrational signatures.
Notably, EIEC appeared to occupy an intermediate but invasion-skewed spectral position, sharing several influential features with Shigella spp. while remaining within the broader E. coli pathotype framework. This pattern is biologically meaningful because EIEC is widely regarded as an evolutionary and pathogenic bridge between classical diarrheagenic E. coli and Shigella (van den Beld and F. A. Reubsaet, 2012). From this perspective, the hierarchy of features identified by XAI suggests that the model captured biologically relevant gradients of pathogenic diversity rather than relying solely on superficial spectral separability.
Similarly, the higher contribution of phenylalanine-associated peaks in invasive pathotypes such as Shigella spp. and EIEC than in toxin-producing types (STEC and ETEC) may reflect the role of phenylalanine residues in forming structural elements within the flagellar system that support invasive phenotypes (Miletic et al., 2021).
To evaluate the robustness of the SHAP-based explanations, a fold-based sensitivity analysis of SHAP feature rankings was conducted (Table S8). SHAP importance values were computed independently for three fold-specific evaluation subsets and compared pairwise. The analysis showed moderate global consistency of feature rankings across folds (mean Spearman ρ = 0.619 ± 0.065). The overlap among small top-k feature sets was relatively limited, indicating that the precise ordering of individual peak-level features varied across folds.
However, interpretation of these spectral contributions remains limited because assigning SERS bands to specific molecular components is inherently challenging and requires further experimental validation. Despite these constraints, the consistent class-dependent patterns observed across bacterial groups indicate that the XAI–SERS framework provides not only robust discriminatory performance but also biologically meaningful insight into the molecular organization underlying pathogenic diversity.
4. Limitations and future perspectives
This study has several limitations that should be acknowledged. First, although the strain set included not only reference strains but also a broad collection of human and environmental isolates, the composition of the dataset was not fully balanced across all target groups. In particular, most pathogenic E. coli strains were represented by isolates originating from food, water, and human sources, whereas the Shigella panel relied predominantly on human-derived strains obtained from national pathogen repositories. This imbalance reflects a practical constraint rather than a design preference, as Shigella strains are less accessible in environmental and food-associated settings owing to their higher biosafety relevance and the limited availability of archived nonclinical isolates. Accordingly, while the present dataset still reflects the clinically meaningful context in which Shigella is most commonly encountered, further expansion using geographically and environmentally diverse nonclinical isolates would strengthen the generalizability of the platform and provide a more rigorous assessment of robustness against source-dependent variability and matrix-related effects.
Nevertheless, the additional strain-wise holdout evaluation conducted in this study provides further evidence of the robustness of the proposed framework. Even under this more stringent evaluation protocol, which tests the model using spectra derived from previously unseen bacterial strains, the model maintained a stable classification accuracy exceeding 0.87. This result suggests that the proposed framework captures generalizable spectral characteristics associated with pathogenic groups rather than relying solely on strain-specific spectral patterns.
Second, the molecular interpretation derived from SHAP analysis should be considered biologically plausible but not chemically definitive. In the present study, SHAP highlighted discriminative spectral regions associated with functional groups and putative biomolecular classes, including aromatic amino acid-, nucleic acid-, and protein backbone-related vibrations.
Consistent with this observation, the fold-based SHAP sensitivity analysis performed in this study indicated moderate global stability of feature importance rankings across evaluation folds. Although the precise ordering of individual spectral peaks varied across subsets, similar spectral regions were consistently identified as informative for model predictions. This pattern suggests that the model captures broadly consistent spectral signals associated with bacterial classification rather than relying on highly dataset-specific peak assignments.
However, because SERS bands in bacterial spectra often reflect overlapping contributions from multiple cellular components, direct one-to-one experimental validation of each model-identified feature using orthogonal techniques such as mass spectrometry remains challenging. Therefore, the present XAI framework should be interpreted primarily as an approach for identifying informative spectral signatures rather than as definitive molecular assignment. Future studies integrating targeted metabolomics, proteomics, standard compound-based spectral referencing, or multimodal spectroscopic validation will be necessary to further resolve the chemical origins of these discriminatory signals.
Third, although the present XAI-SERS framework demonstrated robust performance for bacterial classification under controlled experimental conditions, its analytical sensitivity in the current label-free format remains insufficient for direct low-burden detection. In a concentration-dependent assessment, meaningful SERS signal detection was observed down to 10⁷ CFU/mL (Figure S2G-H), whereas lower concentrations did not provide sufficient signal intensity for reliable analysis. These findings suggest that the present platform is primarily suited for classification following adequate spectral acquisition, rather than for direct trace-level detection. Future studies should therefore focus on enhancing analytical sensitivity through optimized sample pretreatment, bacterial enrichment, signal amplification strategies, or alternative SERS configurations designed for low-concentration applications.
5. Conclusions
In conclusion, this study emphasizes the transformative impact of integrating artificial intelligence technologies into microbiological research, particularly for analyzing spectral data. Traditional methods often struggle to differentiate between E. coli pathotypes and Shigella spp. due to their considerable genotypic and phenotypic similarities. To address this challenge, we used SERS to detect subtle molecular vibrations in microbes. We also used advanced deep learning models to effectively distinguish and classify subtle differences in spectral data. Consequently, our 1D-CNN model achieved a classification accuracy of >97% across nine distinct classes of E. coli pathotypes and Shigella spp. This result demonstrates remarkable potential for clinical applications and emphasizes the capability of our approach to perform comprehensive multiclass analysis in a single assay. Moreover, the XAI-based SHAP analysis revealed that proteins closely associated with pathogenic mechanisms were instrumental in distinguishing each bacterial type. To our knowledge, this study is the first to report an XAI-assisted SERS framework specifically for the discrimination of E. coli pathotypes and Shigella spp., providing an interpretable strategy with potential diagnostic relevance.
Data availability
The code and processed spectral dataset used in this study are publicly available at: https://github.com/youzh-all/XAI-SERS
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This work was supported by the BK21 FOUR program of Graduate School, Kyung Hee University (GS-1-JO-ON-20241887).
Footnotes
Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.crmicr.2026.100588.
Contributor Information
Jungmok You, Email: jmyou@khu.ac.kr.
Dae-Hyun Jung, Email: daehyun@khu.ac.kr.
Hae-Yeong Kim, Email: hykim@khu.ac.kr.
Appendix. Supplementary materials
References
- Saranya A., Subhashini R. A systematic review of explainable artificial Intelligence models and applications: recent developments and future trends. Decis. Anal. J. 2023;7 doi: 10.1016/j.dajour.2023.100230. [DOI] [Google Scholar]
- Aydin Ö., Altaş M., Kahraman M., Bayrak Ö.F., Çulha M. Differentiation of healthy brain tissue and tumors using surface-enhanced Raman scattering. Appl. Spectrosc. 2009;63(10):1095–1100. doi: 10.1366/000370209789553219. [DOI] [PubMed] [Google Scholar]
- Bennish M.L., Ahmed S. In: Hunter's Tropical Medicine and Emerging Infectious Diseases. 10th Edition. Ryan E.T., Hill D.R, Solomon T., Aronson N.E., Endy T.P., editors. Elsevier; London: 2020. 48 - Shigellosis; pp. 492–499. [DOI] [Google Scholar]
- Bi L., Wang X., Cao X., Liu L., Bai C., Zheng Q., Choo J., Chen L. SERS-active Au@Ag core-shell nanorod (Au@AgNR) tags for ultrasensitive bacteria detection and antibiotic-susceptibility testing. Talanta. 2020;220 doi: 10.1016/j.talanta.2020.121397. [DOI] [PubMed] [Google Scholar]
- Bi L., Zhang H., Hu W., Chen J., Wu Y., Chen H., Li B., Zhang Z., Choo J., Chen L. Self-assembly of Au@AgNR along M13 framework: a SERS nanocarrier for bacterial detection and killing. Biosens. Bioelectron. 2023;237 doi: 10.1016/j.bios.2023.115519. [DOI] [PubMed] [Google Scholar]
- Bi L., Zhang H., Mu C., Sun K., Chen H., Zhang Z., Chen L. Paper-based SERS chip with adaptive attention neural network for pathogen identification. J. Hazard. Mater. 2025;494 doi: 10.1016/j.jhazmat.2025.138694. [DOI] [PubMed] [Google Scholar]
- Buyuktepe O., Catal C., Kar G., Bouzembrak Y., Marvin H., Gavai A. Food fraud detection using explainable artificial intelligence. Expert Syst. 2025;42(1) doi: 10.1111/exsy.13387. [DOI] [Google Scholar]
- Cepeda-Molero M., Berger C.N., Walsham A.D.S., Ellis S.J., Wemyss-Holden S., Schüller S., Frankel G., Fernández L. Attaching and effacing (A/E) lesion formation by enteropathogenic E. coli on human intestinal mucosa is dependent on non-LEE effectors. PLoS Pathog. 2017;13(10) doi: 10.1371/journal.ppat.1006706. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chai X., Liu C., Fan X., Huang T., Zhang X., Jiang B., Liu M. Combination of peak-picking and binning for NMR-based untargeted metabonomics study. J. Magn. Reson. 2023;351 doi: 10.1016/j.jmr.2023.107429. [DOI] [PubMed] [Google Scholar]
- Chylińska M., Szymańska-Chargot M., Zdunek A. Imaging of polysaccharides in the tomato cell wall with Raman microspectroscopy. Plant Methods. 2014;10(1):14. doi: 10.1186/1746-4811-10-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ciloglu F.U., Caliskan A., Saridag A.M., Kilic I.H., Tokmakci M., Kahraman M., Aydin O. Drug-resistant Staphylococcus aureus bacteria detection by combining surface-enhanced Raman spectroscopy (SERS) and deep learning techniques. Sci. Rep. 2021;11(1) doi: 10.1038/s41598-021-97882-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Croxen M.A., Law R.J., Scholz R., Keeney K.M., Wlodarska M., Finlay B.B. Recent advances in understanding enteric pathogenic Escherichia coli. Clin. Microbiol. Rev. 2013;26(4):822–880. doi: 10.1128/cmr.00022-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Damm C., Segets D., Yang G., Vieweg B.F., Spiecker E., Peukert W. Shape transformation mechanism of silver nanorods in aqueous solution. Small. 2011;7(1):147–156. doi: 10.1002/smll.201001600. [DOI] [PubMed] [Google Scholar]
- Davis R.A., Charlton A.J., Godward J., Jones S.A., Harrison M., Wilson J.C. Adaptive binning: an improved binning method for metabolomics data using the undecimated wavelet transform. Chemom. Intell. Lab. Syst. 2007;85(1):144–154. doi: 10.1016/j.chemolab.2006.08.014. [DOI] [Google Scholar]
- DeVinney R., Puente J.L., Gauthier A., Goosney D., Finlay B.B. Enterohaemorrhagic and enteropathogenic Escherichia coli use a different Tir-based mechanism for pedestal formation. Mol. Microbiol. 2001;41(6):1445–1458. doi: 10.1046/j.1365-2958.2001.02617.x. [DOI] [PubMed] [Google Scholar]
- Fleming-Dutra K.E., Hersh A.L., Shapiro D.J., Bartoces M., Enns E.A., File T.M., Finkelstein J.A., Gerber J.S., Hyun D.Y., Linder J.A. Prevalence of inappropriate antibiotic prescriptions among US ambulatory care visits, 2010-2011. Jama. 2016;315(17):1864–1873. doi: 10.1001/jama.2016.4151. [DOI] [PubMed] [Google Scholar]
- Freedman S.B., Xie J., Neufeld M.S., Hamilton W.L., Hartling L., Tarr P.I., Nettel-Aguirre A., Chuck A., Lee B., Johnson D., Currie G., Talbot J., Jiang J., Dickinson J., Kellner J., MacDonald J., Svenson L., Chui L., Louie M., Lavoie M., Eltorki M., Vanderkooi O., Tellier R., Ali S., Drews S., Graham T., Pang X.L. Shiga toxin-producing Escherichia coli infection, antibiotics, and risk of developing hemolytic uremic syndrome: a meta-analysis. Clin. Infect. Dis. 2016;62(10):1251–1258. doi: 10.1093/cid/ciw099. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Galvan D.D., Yu Q. Surface-enhanced raman scattering for rapid detection and characterization of antibiotic-resistant bacteria. Adv. Healthc. Mater. 2018;7(13) doi: 10.1002/adhm.201701335. [DOI] [PubMed] [Google Scholar]
- Gomes T.A., Elias W.P., Scaletsky I.C., Guth B.E., Rodrigues J.F., Piazza R.M., Ferreira L., Martinez M.B. Diarrheagenic Escherichia coli. Braz. J. Microbiol. 2016;47:3–30. doi: 10.1016/j.bjm.2016.10.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gómez-Rojo E.M., Romero-Santacreu L., Jaime I., Rovira J. A novel real-time PCR assay for the specific identification and quantification of weissella viridescens in blood sausages. Int. J. Food Microbiol. 2015;215:16–24. doi: 10.1016/j.ijfoodmicro.2015.08.002. [DOI] [PubMed] [Google Scholar]
- Gonzalez-Gonzalez C.R., Hansen M., Stratakos A.C. Rapid identification of foodborne pathogens in limited resources settings using a handheld Raman spectroscopy device. Appl. Sci. 2022;12(19):9909. doi: 10.3390/app12199909. [DOI] [Google Scholar]
- Gopal J., Muthu M. Handheld portable analytics for food fraud detection, the evolution of next-generation smartphone-based food sensors: the journey, the milestones, the challenges debarring the destination. TrAC-Trends Anal. Chem. 2024;171 doi: 10.1016/j.trac.2023.117504. [DOI] [Google Scholar]
- Groom D. Cosmic rays and other nonsense in astronomical CCD imagers. Exp. Astron. 2002;14(1):45–55. doi: 10.1023/A:1026196806990. [DOI] [Google Scholar]
- Haidar-Ahmad N., Manigat F.O., Silué N., Pontier S.M., Campbell-Valois F.-X. A tale about Shigella: evolution, plasmid, and virulence. Microorganisms. 2023;11(7):1709. doi: 10.3390/microorganisms11071709. https://www.mdpi.com/2076-2607/11/7/1709 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Halimeh F.B., Rafei R., Osman M., Kassem I.I., Diene S.M., Dabboussi F.…Hamze M. Historical, current, and emerging tools for identification and serotyping of Shigella. Braz. J. Microbiol. 2021;52(4):2043–2055. doi: 10.1007/s42770-021-00573-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hamed Mozaffari M., Tay L.-L. Overfitting one-dimensional convolutional neural networks for Raman spectra identification. Spectrochim. Acta A Mol. Biomol. Spectrosc. 2022;272 doi: 10.1016/j.saa.2022.120961. [DOI] [PubMed] [Google Scholar]
- Havelaar A.H., Kirk M.D., Torgerson P.R., Gibb H.J., Hald T., Lake R.J., Praet N., Bellinger D.C., De Silva N.R., Gargouri N. World Health Organization global estimates and regional comparisons of the burden of foodborne disease in 2010. PLoS Med. 2015;12(12) doi: 10.1371/journal.pmed.1001923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hendriks A.C., Reubsaet F.A., Kooistra-Smid A., Rossen J.W., Dutilh B.E., Zomer A.L., van Den Beld M.J. Genome-wide association studies of Shigella spp. And Enteroinvasive Escherichia coli isolates demonstrate an absence of genetic markers for prediction of disease severity. BMC Genom. 2020;21:1–12. doi: 10.1186/s12864-020-6555-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang X., Sheng B., Tian H., Chen Q., Yang Y., Bui B., Pi J., Cai H., Chen S., Zhang J., Chen W., Zhou H., Sun P. Real-time SERS monitoring anticancer drug release along with SERS/MR imaging for pH-sensitive chemo-phototherapy. Acta Pharm. Sin. B. 2023;13(3):1303–1317. doi: 10.1016/j.apsb.2022.08.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang X., Yang Y., Zhou H., Hu L., Yang A., Jin H., Zheng B., Pi J., Xu J., Sun P., Cai H.-H., Liang X., Pan B., Zheng J., Zhou H. Coupling of an Au@AgPt nanozyme array with an micrococcal nuclease-specific responsiveness strategy for colorimetric/SERS sensing of Staphylococcus aureus in patients with sepsis. J. Pharm. Anal. 2025;15(2) doi: 10.1016/j.jpha.2024.101085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ikuta K.S., Swetschinski L.R., Aguilar G.R., Sharara F., Mestrovic T., Gray A.P., Weaver N.D., Wool E.E., Han C., Hayoon A.G. Global mortality associated with 33 bacterial pathogens in 2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet. 2022;400(10369):2221–2248. doi: 10.1016/S0140-6736(22)02185-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jeon Y., Lee S., Jeon Y.-J., Kim D., Ham J.-H., Jung D.-H., Kim H.-Y., You J. Rapid identification of pathogenic bacteria using data preprocessing and machine learning-augmented label-free surface-enhanced Raman scattering. Sens. Actuators B Chem. 2025;425 doi: 10.1016/j.snb.2024.136963. [DOI] [Google Scholar]
- Kahraman M., Keseroğlu K., Culha M. On sample preparation for surface-enhanced raman scattering (SERS) of bacteria and the source of spectral features of the spectra. Appl. Spectrosc. 2011;65(5):500–506. doi: 10.1366/10-06184. [DOI] [PubMed] [Google Scholar]
- Kahraman M., Wachsmann-Hogiu S. Label-free and direct protein detection on 3D plasmonic nanovoid structures using surface-enhanced Raman scattering. Anal. Chim. Acta. 2015;856:74–81. doi: 10.1016/j.aca.2014.11.019. [DOI] [PubMed] [Google Scholar]
- Kaper J.B., Nataro J.P., Mobley H.L. Pathogenic Escherichia coli. Nat. Rev. Microbiol. 2004;2(2):123–140. doi: 10.1038/nrmicro818. [DOI] [PubMed] [Google Scholar]
- Kim K., Park S., Kang S., Lee M.K., Chen L., Choo J. SERS-based aptasensor for culture-free detection of Escherichia coli in urinary tract infection diagnosis. Nano Converg. 2025;12(1):40. doi: 10.1186/s40580-025-00506-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kotloff K.L., Riddle M.S., Platts-Mills J.A., Pavlinac P., Zaidi A.K.M. Shigellosis. Lancet. 2018;391(10122):801–812. doi: 10.1016/S0140-6736(17)33296-8. [DOI] [PubMed] [Google Scholar]
- Kotłowski R., Grecka K., Kot B., Szweda P. New approaches for Escherichia coli genotyping. Pathogens. 2020;9(2):73. doi: 10.3390/pathogens9020073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kotobi A., Schwob L., Vonbun-Feldbauer G.B., Rossi M., Gasparotto P., Feiler C., Berden G., Oomens J., Oostenrijk B., Scuderi D., Bari S., Meißner R.H. Reconstructing the infrared spectrum of a peptide from representative conformers of the full canonical ensemble. Commun. Chem. 2023;6(1):46. doi: 10.1038/s42004-023-00835-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kovacs G., Loutfy R., Vincett P., Jennings C., Aroca R. Distance dependence of SERS enhancement factor from Langmuir-Blodgett monolayers on metal island films: evidence for the electromagnetic mechanism. Langmuir. 1986;2(6):689–694. doi: 10.1021/la00072a001. [DOI] [Google Scholar]
- Lan R., Reeves P.R. Escherichia coli in disguise: molecular origins of Shigella. Microbes Infect. 2002;4(11):1125–1132. doi: 10.1016/s1286-4579(02)01637-4. [DOI] [PubMed] [Google Scholar]
- Lee H., Liao J.-D., Tsai H.-P., Chen C.-H., Sitjar J., Fu W.-E., Lin F.-H. Label-free SERS method with size-matched selectivity for analytes of varying sizes. Surf. Interfaces. 2024;44 doi: 10.1016/j.surfin.2023.103821. [DOI] [Google Scholar]
- Li Y., Guo Y., Ye B., Zhuang Z., Lan P., Zhang Y., Zhong H., Liu H., Guo Z., Liu Z. Rapid label-free SERS detection of foodborne pathogenic bacteria based on hafnium ditelluride-Au nanocomposites. J. Innov. Opt. Health Sci. 2020;13(05) doi: 10.1142/S1793545820410047. [DOI] [Google Scholar]
- Liu S., Hu Q., Li C., Zhang F., Gu H., Wang X., Li S., Xue L., Madl T., Zhang Y. Wide-range, rapid, and specific identification of pathogenic bacteria by Surface-enhanced Raman spectroscopy. ACS Sens. 2021;6(8):2911–2919. doi: 10.1021/acssensors.1c00641. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu W., Tang J.-W., Lyu J.-W., Wang J.-J., Pan Y.-C., Shi X.-Y., Liu Q.-H., Zhang X., Gu B., Wang L. Discrimination between carbapenem-resistant and carbapenem-sensitive Klebsiella pneumoniae strains through computational analysis of surface-enhanced Raman spectra: a pilot study. Microbiol. Spectr. 2022;10(1):e02409–e02421. doi: 10.1128/spectrum.02409-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu Y., Kim M., Cho S.H., Jung Y.S. Vertically aligned nanostructures for a reliable and ultrasensitive SERS-active platform: fabrication and engineering strategies. Nano Today. 2021;37 doi: 10.1016/j.nantod.2020.101063. [DOI] [Google Scholar]
- Liu Y., Zhou H., Hu Z., Yu G., Yang D., Zhao J. Label and label-free based surface-enhanced raman scattering for pathogen bacteria detection: a review. Biosens. Bioelectron. 2017;94:131–140. doi: 10.1016/j.bios.2017.02.032. [DOI] [PubMed] [Google Scholar]
- Lundberg S.M., Lee S.-I. arXiv preprint arXiv: 1706.06060. 2017. Consistent feature attribution for tree ensembles. [DOI] [Google Scholar]
- Martiny D., Busson L., Wybo I., El Haj R.A., Dediste A., Vandenberg O. Comparison of the Microflex LT and Vitek MS systems for routine identification of bacteria by matrix-assisted laser desorption ionization–time of flight mass spectrometry. J. Clin. Microbiol. 2012;50(4):1313–1325. doi: 10.1128/JCM.05971-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miletic S., Fahrenkamp D., Goessweiner-Mohr N., Wald J., Pantel M., Vesper O., Kotov V., Marlovits T.C. Substrate-engaged type III secretion system structures reveal gating mechanism for unfolded protein translocation. Nat. Commun. 2021;12(1):1546. doi: 10.1038/s41467-021-21143-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pakbin B., Brück W.M., Rossen J.W. Virulence factors of enteric pathogenic Escherichia coli: a review. Int. J. Mol. Sci. 2021;22(18):9922. doi: 10.3390/ijms22189922. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pasqua M., Michelacci V., Di Martino M.L., Tozzoli R., Grossi M., Colonna B., Morabito S., Prosseda G. The intriguing evolutionary journey of enteroinvasive E. coli (EIEC) toward pathogenicity. Front. Microbiol. 2017;8:2390. doi: 10.3389/fmicb.2017.02390. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peng Z., Wang X., Huang J., Li B. In: Molecular Medical Microbiology. 3rd Edition. Tang Y.-W., Hindiyeh M.Y., Liu D., Sails A., Spearman P., Zhang J.-R., editors. Academic Press; 2024. Chapter 53 - pathogenic Escherichia coli; pp. 1065–1096. [Google Scholar]
- Premasiri W.R., Lee J.C., Sauer-Budge A., Théberge R., Costello C.E., Ziegler L.D. The biochemical origins of the surface-enhanced Raman spectra of bacteria: a metabolomics profiling by SERS. Anal. Bioanal. Chem. 2016;408:4631–4647. doi: 10.1007/s00216-016-9540-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sadeghi Z., Alizadehsani R., Cifci M.A., Kausar S., Rehman R., Mahanta P., Bora P.K., Almasri A., Alkhawaldeh R.S., Hussain S., Alatas B., Shoeibi A., Moosaei H., Hladík M., Nahavandi S., Pardalos P.M. A review of explainable artificial Intelligence in healthcare. Comput. Electr. Eng. 2024;118 doi: 10.1016/j.compeleceng.2024.109370. [DOI] [Google Scholar]
- Samal A.K., Polavarapu L., Rodal-Cedeira S., Liz-Marzán L.M., Pérez-Juste J., Pastoriza-Santos I. Size tunable Au@ Ag core–shell nanoparticles: synthesis and surface-enhanced raman scattering properties. Langmuir. 2013;29(48):15076–15082. doi: 10.1021/la403707j. [DOI] [PubMed] [Google Scholar]
- Samek O., Bernatová S., Dohnal F. The potential of SERS as an AST methodology in clinical settings. Nanophotonics. 2021;10(10):2537–2561. doi: 10.1515/nanoph-2021-0095. [DOI] [Google Scholar]
- Shen Z., Xie L., Hou Y., Liang J., Jia Y., Zhang H., Sun Z., Du J., He Z., Liu C., Liu W. An interpretable SERS-AI platform for rapid and quantitative diagnosis of polymicrobial UTIs: powered by positively charged plasmonic nanoparticles and attention-based deep learning. Adv. Sci. (Weinh) 2025;12(46) doi: 10.1002/advs.202513502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stokes R.J., Macaskill A., Lundahl P.J., Smith W.E., Faulds K., Graham D. Quantitative enhanced Raman scattering of labeled DNA from gold and silver nanoparticles. Small. 2007;3(9):1593–1601. doi: 10.1002/smll.200600662. [DOI] [PubMed] [Google Scholar]
- Sun J., Xu X., Feng S., Zhang H., Xu L., Jiang H., Sun B., Meng Y., Chen W. Rapid identification of salmonella serovars by using Raman spectroscopy and machine learning algorithm. Talanta. 2023;253 doi: 10.1016/j.talanta.2022.123807. [DOI] [PubMed] [Google Scholar]
- Tang J.-W., Li J.-Q., Yin X.-C., Xu W.-W., Pan Y.-C., Liu Q.-H., Gu B., Zhang X., Wang L. Rapid discrimination of clinically important pathogens through machine learning analysis of surface enhanced raman spectra. Front. Microbiol. 2022;13 doi: 10.3389/fmicb.2022.843417. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tang J.-W., Liu Q.-H., Yin X.-C., Pan Y.-C., Wen P.-B., Liu X., Kang X.-X., Gu B., Zhu Z.-B., Wang L. Comparative analysis of machine learning algorithms on surface enhanced raman spectra of clinical Staphylococcus species. Front. Microbiol. 2021;12 doi: 10.3389/fmicb.2021.696921. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thrift W.J., Ronaghi S., Samad M., Wei H., Nguyen D.G., Cabuslay A.S., Groome C.E., Santiago P.J., Baldi P., Hochbaum A.I., Ragan R. Deep learning analysis of vibrational spectra of bacterial lysate for rapid antimicrobial susceptibility testing. ACS Nano. 2020;14(11):15336–15348. doi: 10.1021/acsnano.0c05693. [DOI] [PubMed] [Google Scholar]
- Troeger C., Blacker B.F., Khalil I.A., Rao P.C., Cao S., Zimsen S.R., Albertson S.B., Stanaway J.D., Deshpande A., Abebe Z. Estimates of the global, regional, and national morbidity, mortality, and aetiologies of diarrhoea in 195 countries: a systematic analysis for the Global Burden of Disease Study 2016. Lancet Infect. Dis. 2018;18(11):1211–1228. doi: 10.1016/S1473-3099(18)30362-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tseng Y.-M., Chen K.-L., Chao P.-H., Han Y.-Y., Huang N.-T. Deep learning–assisted surface-enhanced Raman scattering for rapid bacterial identification. ACS Appl. Mater. Interfaces. 2023;15(22):26398–26406. doi: 10.1021/acsami.3c03212. [DOI] [PubMed] [Google Scholar]
- van den Beld M., Reubsaet F. Differentiation between Shigella, enteroinvasive Escherichia coli (EIEC) and noninvasive Escherichia coli. Eur. J. Clin. Microbiol. Infect. Dis. 2012;31:899–904. doi: 10.1007/s10096-011-1395-7. [DOI] [PubMed] [Google Scholar]
- van den Beld M.J., Rossen J.W., Evers N., Kooistra-Smid M.A., Reubsaet F.A. MALDI-TOF MS using a custom-made database, biomarker assignment, or mathematical classifiers does not differentiate Shigella spp. And Escherichia coli. Microorganisms. 2022;10(2):435. doi: 10.3390/microorganisms10020435. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Walter A., März A., Schumacher W., Rösch P., Popp J. Towards a fast, high specific and reliable discrimination of bacteria on strain level by means of SERS in a microfluidic device. Lab Chip. 2011;11(6):1013–1021. doi: 10.1039/C0LC00536C. [DOI] [PubMed] [Google Scholar]
- Wang K., Chen L., Ma X., Ma L., Chou K.C., Cao Y.…Lu X. Arcobacter identification and species determination using Raman spectroscopy combined with neural networks. Appl. Environ. Microbiol. 2020;86(20) doi: 10.1128/aem.00924-20. -20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang L., Zhang X.-D., Tang J.-W., Ma Z.-W., Usman M., Liu Q.-H., Wu C.-Y., Li F., Zhu Z.-B., Gu B. Machine learning analysis of SERS fingerprinting for the rapid determination of mycobacterium tuberculosis infection and drug resistance. Comput. Struct. Biotechnol. J. 2022;20:5364–5377. doi: 10.1016/j.csbj.2022.09.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Whitaker D.A., Hayes K. A simple algorithm for despiking Raman spectra. Chemom. Intell. Lab. Syst. 2018;179:82–84. doi: 10.1016/j.chemolab.2018.06.009. [DOI] [Google Scholar]
- Zaidi M.B., Estrada-García T. Shigella: a highly virulent and elusive pathogen. Curr. Trop. Med. Rep. 2014;1:81–87. doi: 10.1007/s40475-014-0019-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang J., Feng L., He Y., Wu Y., Dong Y. Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 2023. Temporal convolutional explorer helps understand 1d-cnn's learning behavior in time series classification from frequency domain; pp. 3351–3360. [Google Scholar]
- Zhang L., Henson M.J. A practical algorithm to remove cosmic spikes in Raman imaging data for pharmaceutical applications. Appl. Spectrosc. 2007;61(9):1015–1020. doi: 10.1366/000370207781745847. [DOI] [PubMed] [Google Scholar]
- Zhao J., Lui H., McLean D.I., Zeng H. Automated autofluorescence background subtraction algorithm for biomedical raman spectroscopy. Appl. Spectrosc. 2007;61(11):1225–1232. doi: 10.1366/000370207782597003. [DOI] [PubMed] [Google Scholar]
- Zhong L., Guo X., Ding M., Ye Y., Jiang Y., Zhu Q., Li J. SHAP values accurately explain the difference in modeling accuracy of convolution neural network between soil full-spectrum and feature-spectrum. Comput. Electron. Agric. 2024;217 doi: 10.1016/j.compag.2024.108627. [DOI] [Google Scholar]
- Zimmermann S., Horner S., Altwegg M., Dalpke A.H. Workflow optimization for syndromic diarrhea diagnosis using the molecular Seegene Allplex™ GI-bacteria (I) assay. Eur. J. Clin. Microbiol. Infect. Dis. 2020;39:1245–1250. doi: 10.1007/s10096-020-03837-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zuo G., Xu Z., Hao B. Shigella strains are not clones of Escherichia coli but sister species in the genus Escherichia. Genomics Proteomics Bioinformatics. 2013;11(1):61–65. doi: 10.1016/j.gpb.2012.11.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The code and processed spectral dataset used in this study are publicly available at: https://github.com/youzh-all/XAI-SERS






