Abstract
Gastric cancer detection remains challenging due to the lack of noninvasive early diagnostic tools. This study investigates profiling of serum volatile organic compounds (VOCs) using gas chromatography-ion mobility spectrometry (GC–IMS) for gastric cancer screening. Serum samples were obtained from 277 participants, including 123 patients with gastric cancer, 38 patients with precancerous diseases (PD), and 116 healthy controls (HC). In the model development group, Kruskal–Wallis tests showed that the levels of 19 VOCs differed significantly among gastric cancer, PD, and HC groups (p < 0.05). Based on the VOCs that differed significantly, a support vector machine (SVM) model achieved the best performance among the six models tested. Using importance ranking and forward selection, 11 VOCs were selected for the final model, achieving 96.4% accuracy in the validation set and 92.9% in an independent test set, showing higher diagnostic accuracy than the traditional tumor marker carcinoembryonic antigen. The model also achieved 100% sensitivity and > 90% specificity for detecting early gastric cancer in both the validation and test sets. Collectively, our findings suggest that GC–IMS–based serum VOC profiling may offer a potential noninvasive approach for gastric cancer detection.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-026-42602-z.
Keywords: Gastric cancer, Volatile organic compounds, Serum, Gas chromatography-ion mobility spectrometry
Subject terms: Biochemistry, Biomarkers, Cancer, Oncology
Introduction
Gastric cancer remains a major global health burden, ranking fifth in incidence and fourth in cancer-related mortality worldwide1. Owing to nonspecific early symptoms and the absence of effective population-wide screening, most patients are diagnosed at an advanced stage with a poor prognosis2. Although endoscopic tissue biopsy is the gold standard for gastric cancer diagnosis, its invasive nature, high cost, and potential for patient discomfort limit its utility for large-scale population screening. Consequently, there is an urgent need to develop new noninvasive methods to enable early detection and improve patient outcomes.
Volatile organic compounds (VOCs) are carbon-based chemicals with high vapor pressure at ambient temperatures. Biologically, VOCs arise from endogenous biochemical processes and disease-associated metabolic alterations, and subsequently enter the circulatory system. Therefore, disease-specific changes in VOC composition and concentration in biological samples—including blood, urine, feces and exhaled breath—can reflect underlying pathophysiology, making VOCs attractive noninvasive biomarkers for screening3. Previous studies have reported that VOCs detected in exhaled breath can be used for the discrimination of gastric cancer patients from healthy controls, supporting their potential utility in cancer screening4. Among various biofluids, serum is the most widely used specimen in clinical practice because of its accessibility and stability; however, the diagnostic application of serum VOCs in gastric cancer remains incompletely characterized.
Gas chromatography-ion mobility spectrometry (GC–IMS) is an analytical technique that couples the high-resolution separation of gas chromatography with the sensitive detection of ion mobility spectrometry. Characterized by exceptional sensitivity, short analysis time, simple operation, and low cost, GC–IMS provides two-dimensional fingerprint information of VOCs, making it a potentially useful tool for rapid screening applications5. In the field of cancer diagnostics, GC–IMS has shown promise for the non-invasive detection of disease-specific VOC profiles from serum. Recent studies confirm its clinical value in distinguishing hepatocellular carcinoma, biliary tract cancer, and esophageal cancer from controls, achieving high diagnostic accuracy through machine-learning integration6–8. Thus, we speculate that GC–IMS–based serum VOC profiling may facilitate the noninvasive early detection of gastric cancer.
In this study, we applied GC–IMS to profile serum VOCs in individuals with gastric cancer, precancerous diseases (PD), and healthy controls (HC). By integrating supervised machine-learning algorithms with the serum VOCs dataset, we developed a VOC-based classifier model with improved discriminatory power. Collectively, our findings indicate that GC–IMS–based serum VOC profiling may offer a potential noninvasive approach for the detection of gastric cancer.
Results
Study design and clinical characteristics of subjects
The design flow of the present study is shown in Fig. 1A. The model development group comprised 221 subjects recruited from January to December 2022, including 99 gastric cancer patients, 30 PD patients, and 92 HC subjects. These participants were randomly divided into a training set (n = 166) and an internal validation set (n = 55) at a 7:3 ratio to construct and optimize the machine learning models. An independent test group (n = 56) was recruited between January 2023 and March 2023 to evaluate the classification performance of the model. This group consisted of 24 gastric cancer patients, 8 PD patients, and 24 HC subjects. Sample collection and analytical procedures are shown in Fig. 1B.
Fig. 1.
The flowchart of study design. (A) Schematic illustration of cohort composition and study design. (B) Workflow of sample collection, GC–IMS analysis, and data processing. GC, gastric cancer; PD, precancerous disease; HC, healthy control; GC–IMS, gas chromatography-ion mobility spectrometry; VOCs, volatile organic compounds.
There were no significant differences in age, sex, height, weight or body mass index (BMI) among the gastric cancer, PD and HC groups (p > 0.05) (Table 1, Supplementary Tables S1-3). Serum carcinoembryonic antigen (CEA) levels were significantly higher in the gastric cancer group than in other groups. Detailed demographic data and gastric cancer clinical staging information are presented in Table 1 and Supplementary Table S4.
Table 1.
Clinical characteristics of the entire cohort.
| Characteristics | Gastric cancer (N = 123) | PD (N = 38) | HC (N = 116) | p value |
|---|---|---|---|---|
| Age, medians (IQR) | 63 (55–69) | 65 (58.8–69.5) | 60 (52–71) | 0.165‡ |
| Gender (male/female) | 80/43 | 22/16 | 72/44 | 0.711§ |
| Body height (means ± SD, cm) | 168.6 ± 1.0 | 166.9 ± 1.7 | 169.36 ± 0.8 | 0.428† |
| Weight (means ± SD, kg) | 67.9 ± 1.0 | 65.0 ± 2.1 | 67.7 ± 1.1 | 0.411† |
| BMI (means ± SD, kg/m2) | 23.9 ± 0.3 | 23.3 ± 0.7 | 23.6 ± 0.4 | 0.735† |
| CEA | 2.4(1.6–3.7) | 1.6(0.9–2.6) | 1.5(1.0–2.4) | < 0.001‡ |
†One-way ANOVA test; ‡Kruskal–Wallis Test; §χ2 test; BMI, body mass index; CEA, carcinoembryonic antigen; IQR, interquartile range; N, number.
GC–IMS based VOC profile analysis in gastric cancer, PD and HC
As in Fig. 2A, substances in each serum sample were characterized by retention index (RI) and drift time (Dt) and quantified using peak height. For a better comparative analysis of the changes in VOCs in gastric cancer, the plots were downscaled to obtain a two-dimensional top view (Fig. 2B), with each point to the right of the reaction ion peak (RIP) representing a VOC and the color indicating the peak intensity. A total of 52 VOCs were identified based on RI and Dt (Supplementary Table S5). Their abundances were quantified from peak heights. Nineteen of these VOCs were statistically significant according to the Kruskal–Wallis test (p < 0.05) and were therefore selected for further analysis (Supplementary Table S6).
Fig. 2.
Detection and analysis of serum VOCs profiles in gastric cancer, PD and HC groups. (A) The 3D spectral map generated by GC–IMS shows the peak strength, retention index, and drift time of VOCs. (B) The 2D plot shows the difference of VOCs in gastric cancer, PD, and HC samples, and each point represents a signal peak that can visually show the drift time and retention index of different VOCs. (C) sPLS-DA was able to effectively distinguish gastric cancer samples from the PD and HC samples. (D) Unsupervised hierarchical clustering analysis also demonstrates significant differences among gastric cancer, PD, and HC samples. GC, gastric cancer; PD, precancerous disease; HC, healthy control; RIP, reaction ion peak; sPLS-DA, sparse partial least squares discriminant analysis.
Sparse partial least squares discriminant analysis (sPLS-DA) results demonstrated that the PD group formed a distinct cluster, and the gastric cancer and HC groups were partially separated along the second component (Fig. 2C), and unsupervised cluster analysis revealed significant differences between the three groups for the 19 VOCs mentioned above (Fig. 2D).
Evaluation of six machine learning models for gastric cancer diagnosis based on serum VOCs
In order to obtain the diagnostic performance for evaluating serum VOCs and to obtain the best classification model, six machine learning algorithms including random forest (RF), neural network (NN), support vector machine (SVM), decision tree (DT), Naive Bayesian (NB), and K-Nearest Neighbor (KNN) were used to construct classification models for 19 VOCs. As shown in Fig. 3A, all six machine learning models were able to effectively distinguish gastric cancer, PD and HC with an accuracy of more than 70%. Furthermore, in order to apply the dichotomous model evaluation metrics to our multiclass classification, we compressed the three classes into two classes by combining two of the three groups for comparison, as shown in Supplementary Fig. S1. Notably, the SVM model has an area under the curve (AUC) of 0.945, 1.000 and 0.947 for recognizing gastric cancer, PD, and HC, respectively, with an accuracy of 94.6% that was higher than the other five models (Fig. 3B). The detailed metrics of the six machine learning models are reported in Supplementary Table S7.
Fig. 3.
Prediction ability of 19 differential VOCs based on 6 machine learning algorithms. (A) The confusion matrix of models constructed by RF, SVM, NN, DT, NB, and KNN in the validation set to classify patients as gastric cancer, PD, and HC. (B) ROC curves analysis for the SVM model to identify gastric cancer, PD and HC in the validation set. RF, Random Forest; NN, Neural Network; SVM, Support Vector Machine; DTT, Decision Tree; NB, Naive Bayesian; KNN, K-Nearest Neighbor; GC, gastric cancer; PD, precancerous disease; HC, healthy control; AUC, area under the curve.
Development of an 11-VOCs model for gastric cancer diagnosis using SVM
We further focused on the number of VOCs in the optimal SVM classification model to identify the optimal combination of VOCs used to distinguish between disease and control groups. Based on the ranking of the mean importance of VOCs by SVM model (Supplementary Fig. S2), we identified that the model had the greatest accuracy of 96.4% in the validation set when 11 variables (alpha-terpinolene, 2-butoxyethanol, benzaldehyde-M, ethyl acrylate, (E)-2-octenal-D, 1-octen-3-one-M, (E)-2-octenal-M, 2-furaldehyde, 2-propanone, (E)-2-heptenal-D, and methyl 2-furoate) were analyzed (Fig. 4A), with AUCs of 0.964, 1 and 0.952 in distinguishing the three groups (Fig. 4B). Therefore, we selected the 11-VOCs model based on the SVM algorithm as the final classifier. Supplementary Fig. S3A-K report the differences in the distributions of the aforementioned 11 VOCs among the three groups, where 4 VOCs ((E)-2-octenal-D, 1-octen-3-one-M, (E)-2-octenal-M and (E)-2-heptenal-D) were up-regulated and 2 VOCs (2-butoxyethanol and 2-furaldehyde) were down-regulated in gastric cancer patients.
Fig. 4.
Identification of the best combination of VOCs for gastric cancer diagnosis. (A) Line plots show the accuracy of the model corresponding to the selection of different numbers of VOCs. (B, C) ROC curve analysis of SVM-based 11-VOCs model for discriminating gastric cancer, PD and HC in validation set (B) and test set (C). (D, E) Comparison of the diagnostic performance of SVM-based 11-VOCs model and CEA in identifying gastric cancer and other groups in validation set (D) and test set (E). GC, gastric cancer; PD, precancerous disease; HC, healthy control; AUC, area under the curve; SVM-11VOCs, 11-volatile organic compounds model based on the Support Vector Machine; CEA, carcinoembryonic antigen.
In addition, the diagnostic efficacy of the model was further evaluated using the test set, and the results revealed that the 11-VOCs model exhibited an accuracy of 92.9% with AUCs of 0.932, 0.875 and 0.964, respectively (Fig. 4C). Importantly, both the validation and test sets demonstrated that the 11-VOCs model exhibited better diagnostic performance than the traditional gastric cancer marker CEA in differentiating patients with gastric cancer from those in the other groups (Fig. 4D–E). Additional data are available in Supplementary Table S8.
Performance of the SVM-based 11-VOCs model for the diagnosis of early gastric cancer (EGC)
Given the critical clinical importance of identifying cancer at its earliest stages, we further evaluated whether the established 11-VOCs model maintained its diagnostic performance for EGC. In the validation set, the model achieved an overall accuracy of 94.4% in correctly categorizing patients into the three groups. In terms of discriminating EGC patients from HC, all eight patients with EGC were correctly identified, with a sensitivity of 100.0% and a specificity of 92.9% (Supplementary Table S9). In the test set, the model obtained an overall accuracy of 92.3%, and all seven patients with EGC were diagnosed correctly, with a sensitivity of 100.0%, a specificity of 90.6%, and an AUC of 0.953 (Fig. 5 and Supplementary Table S9).
Fig. 5.
Performance of SVM-based 11-VOCs model for early diagnosis of gastric cancer in the test set. (A) ROC curve analysis of the SVM-based 11-VOCs model to discriminate between the gastric cancer group and the other groups in the test set. (B) Binary confusion matrix for discriminating the gastric cancer group from the other groups by the SVM-based 11-VOCs model in the test set. AUC, area under the curve; VOCs, volatile organic compounds; EGC, early gastric cancer.
Discussion
In this study, we demonstrate that serum VOC profiling combined with machine learning may aid in classification of gastric cancer. Notable findings include: (1) using GC–IMS, we identified significant alterations in the serum VOC signature of gastric cancer patients; (2) by incorporating SVM algorithms, we developed an 11-VOC classifier that was evaluated in both a validation set and an independent test set; and (3) the 11-VOC model showed higher diagnostic accuracy than the conventional serum tumor marker CEA in distinguishing gastric cancer patients from others. Collectively, the 11-VOC SVM classifier represents a potential noninvasive tool for gastric cancer detection.
Prior research has established VOCs as potential biomarkers for gastric cancer diagnosis. For instance, Tong et al.9 utilized Gas chromatography-mass spectrometry (GC–MS) coupled with multivariate analysis to profile exhaled breath VOCs, successfully discriminating gastric cancer patients from those with gastric ulcers, chronic gastritis, and HC, and identifying 14 differential VOCs. More recently, a study analyzing exhaled VOCs from 157 participants via thermal desorption gas chromatography with triple quadrupole mass spectrometer developed a 6-VOC model using multiple machine learning algorithms—including Rule-based C5.0, NB, Multivariate Adaptive Regression Splines, SVM, Extreme Gradient Boosting Trees, and RF—which distinguished gastric cancer patients from HC with AUCs of 0.92 and 0.91 in discovery and replication cohorts, respectively10. Extending these advances, we evaluate the utility of GC–IMS for VOC detection in gastric cancer. Relative to conventional mass spectrometry–based platforms, GC–IMS offers operational simplicity, rapid analysis, and minimal sample preparation, enabling scalable deployment in routine clinical practice. By integrating GC–IMS with an SVM classifier, we developed a serum VOC-based model achieving 96.4% accuracy, indicating that GC–IMS may be a viable complementary technique for VOC analysis in gastric cancer diagnostics.
Although exhaled breath has been widely examined, its clinical utility is limited by sensitivity to confounding factors such as smoking, diet, and environmental exposure11. In contrast, serum VOCs—released into circulation from tumor-associated metabolic alterations prior to pulmonary excretion—may more directly reflect systemic malignant changes12. Indeed, profiling of blood-based VOCs, including those measured in serum and plasma, has shown promising diagnostic performance in other malignancies13,14. Consistent with this premise, our serum VOC-based model demonstrated good diagnostic accuracy for gastric cancer. Moreover, serum is a routine clinical specimen with standardized collection protocols and reduced susceptibility to ambient contamination compared with exhaled breath. In this study, we implemented strict pre-analytical controls—including an 8-h fast, exclusion of participants with comorbidities or prior anticancer treatments, and a standardized laboratory environment—to minimize interference. While long-term dietary and lifestyle habits were not restricted, thereby introducing physiological variability, this approach enhances the model’s generalizability to real-world clinical settings where strict lifestyle standardization is often impractical.
CEA as a traditional serum tumor marker is routinely used for gastric cancer in-vitro auxiliary diagnosis but exhibits limited sensitivity, with positivity rates below 50% in advanced gastric cancer and under 10% in early-stage disease, constraining its utility for early detection15. In contrast, our serum VOC-based model achieved an AUC of 0.964, substantially outperforming CEA (AUC = 0.558) in the same set. Importantly, given the critical role of early detection in improving survival, our model demonstrated 94.4% accuracy for EGC. These results suggest that VOC profiling may offer an alternative to existing serum biomarkers, offering a non-invasive strategy for detecting gastric cancer, particularly occult early-stage lesions.
Notably, among the eleven differential VOCs in this study, several have also been reported as altered in other cancer types, indicating shared metabolic dysregulation across malignancies. For example, 2-furaldehyde was significantly down-regulated in the serum of gastric cancer, consistent with findings in the urine of patients with clear cell renal cell carcinoma16. Altered levels of (E)-2-octenal and 1-octen-3-one were also found in bile of gallbladder cancer patients17. In addition, benzaldehyde in exhaled breath and oral air has been implicated as a biomarker in oral squamous cell carcinoma, head and neck squamous cell carcinoma, and clear cell renal cell carcinoma16,18,19. The recurrent detection of these VOCs across tumor types likely reflects common metabolic disturbances in tumorigenesis, such as increased lipid peroxidation, elevated oxidative stress, and disordered aldehyde metabolism. In addition to these previously reported compounds, we identified altered serum levels of alpha-terpinolene, ethyl acrylate, methyl 2-furoate and (E)-2-heptenal-D in gastric cancer patients for the first time. This highlights the importance of a multi-VOC panel over single markers to enhance diagnostic specificity for gastric cancer.
Although the mechanisms linking VOCs to gastric cancer progression require further elucidation, several biological pathways may be involved. In this study, (E)-2-octenal-D, 1-octen-3-one-M, (E)-2-octenal-M and (E)-2-heptenal-D were elevated in gastric cancer serum, whereas the levels of 2-furaldehyde and 2-butoxyethanol were inversely correlated with gastric cancer. Specific enzymes likely influence the production of these VOCs. For example, aldehyde dehydrogenase (ALDH) activity is upregulated in gastric cancer stem cells, catalyzing aldehydes conversion to carboxylic acids20, which may explain reduced serum levels of aldehydes such as 2-furaldehyde and benzaldehyde-M in gastric cancer patients. Similarly, elevated alcohol dehydrogenase (ADH) activity in gastric cancer tissues could promote the oxidation of alcohols to ketones, contributing to the accumulation of 1-octen-3-one-M21. Enhanced lipid peroxidation and fatty acid β-oxidation within tumor cells may also drive the generation of unsaturated aldehydes and ketones, including (E)-2-octenal and 1-octen-3-one-M. These metabolic shifts are closely associated with reactive oxygen species (ROS), which play a key role in tumorigenesis and cancer progression22.
Several limitations of this study should be noted. First, GC–IMS provides semi-quantitative data based on peak intensities rather than absolute VOCs concentrations. While sufficient for classification, future validation using quantitative methods, such as GC–MS with external standards is needed to determine the absolute biomarkers levels and facilitate clinical translation. Second, this was a single-center study. Although an independent test group was included, the sample size—particularly for the PD group—was relatively small compared to the gastric cancer and HC groups. This imbalance may limit the statistical power for characterizing the transition from pre-cancerous lesions to cancer. Therefore, larger multi-center cohorts are required to validate the model’s robustness across diverse populations.
In summary, we constructed an SVM model incorporating 11 differential VOCs that showed improved diagnostic performance compared to conventional CEA analysis in gastric cancer detection. These findings demonstrate that serum VOCs profiling may represent a potential noninvasive approach for gastric cancer detection, with particular potential for identifying early-stage disease.
Methods
Study subjects
A total of 277 subjects were recruited from Qilu Hospital of Shandong University, including 123 patients with gastric cancer, 38 patients with PD, and 116 HC. The gastric cancer group comprised patients with stage Ⅰ (30.9%), Ⅱ (32.5%), Ⅲ (30.9%) and Ⅳ (5.7%). The study protocol was approved by the Ethics Committee of Qilu Hospital of Shandong University (KYLL-202411-048-1) and was conducted following the guidelines of the Declaration of Helsinki. Informed consent was obtained from all eligible subjects.
The inclusion criteria for patients with gastric cancer were (1) aged 18 years or older before enrollment; (2) diagnosed with gastric cancer by two independent physicians in the Department of Pathology, Qilu Hospital of Shandong University; (3) no history of any other malignancies and anticancer treatment; (4) no family history of malignant tumors; (5) no comorbidities such as diabetes, fatty liver, kidney disease, cardiovascular disease or upper respiratory tract infection. All gastric cancer patients were staged based on the 8th edition of the American Joint Committee on Cancer (AJCC) staging principle23. As a specific subgroup of the gastric cancer cohort, EGC was defined as an invasive carcinoma confined to the mucosa or submucosa (T1), with or without lymph node metastasis (any N). The inclusion criteria for PD patients were (1) aged 18 years and above; (2) diagnosed with chronic atrophic gastritis, chronic gastric ulcers, or adenomatous gastric polyps based on endoscopic and pathological examinations at Qilu Hospital of Shandong University; (3) no history of malignancies or other diseases outside the stomach. The inclusion criteria for HC were (1) age ≥ 18 years; (2) received physical examination at Qilu Hospital of Shandong University and biochemical indices of the liver and kidney were normal; (3) no history of gastrointestinal diseases and gastroparesis; (4) no history of tumors or other major diseases.
Sample collection
Whole blood samples were collected from subjects after 8 h of fasting and serum was separated in inert separating procoagulant tubes. All samples were collected preoperatively, without hemolysis or jaundice. The collected serum samples were first centrifuged at 6,000 × g for 10 min at 4 °C, and the supernatant was then transferred to a new tube and centrifuged again at 12,000 × g for 10 min at 4 °C. The final supernatant was immediately stored at − 80 °C. Samples were stored at − 80 °C for no more than 3 months.
Analysis of VOCs in serum
The analysis of VOCs in serum samples was performed using GC–IMS, “FlavorSpec” (G.A.S Dortmund, Germany). The two-fold separation of VOCs was achieved by combining gas chromatography and ion mobility spectrometry to improve the analytical capability of compounds. GC–IMS initially separated the volatile components using their RI via gas chromatography and the separated VOCs were introduced into the IMS system to generate ions that are passed through an ion migration pipeline into an ion detector for detection. The samples were injected into the instrument in a randomized manner, with the loading process carried out independently by different operators, and the pre-treatment and detection conditions for all samples were kept consistent, following the procedure outlined below: the serum samples were thawed at 4 °C and 200 μL of sample was added to a headspace vial (containing 0.02 g of aspartic acid), incubated at 60 °C for 10 min, and then 1 mL of gas from the headspace was drawn into the GC–IMS system using the injection needle. The carrier gas and drift gas utilized in the system were of high purity and composed of nitrogen with a content of 99.99%. The flow rate of the carrier gas was 2 mL /min for the first 2 min, which was increased to 150 mL/min within 15 min, and the flow rate of the drift gas was set at 150 mL/min. The temperatures of the drift tube, gas chromatography column, inlet, connection line 1, and connection line 2 were set to 45 °C, 80 °C, 80 °C, 80 °C and 45 °C, respectively.
The qualitative analysis of VOCs was performed using the built-in GC–IMS library integrated with the NIST2020 database. Specifically, n-ketones (C4–C9) (Sinopharm Chemical Reagents Co., Ltd., China) were used as external reference standards to calculate the RI for each compound. Identification was achieved by matching the calculated RI and Dt of the detected signals with the standard data in the library. Compounds that successfully matched the library database were identified and named according to the NIST spectral library. The detected signals were quantified using peak height. Prior to statistical analysis, the data were preprocessed using the VOCal software (version 0.4.12, https://www.gas-dortmund.de), specifically including baseline correction and normalization, to ensure data consistency across samples.
Statistical analysis
The One-way ANOVA (if normally distributed) or Kruskal–Wallis test (if non-normally distributed) was utilized to compare the differences in VOCs among gastric cancer, PD and HC groups, and the p-values were adjusted by the Benjamini–Hochberg method. sPLS-DA was performed using the mixOmics package to assess group separation. This method applies L1 regularization to select the most discriminative variables, enabling simultaneous dimensionality reduction and feature selection. Unsupervised hierarchical clustering was performed using the pheatmap package to visualize global VOC expression patterns across samples. Six machine learning algorithms were employed, including RF, NN, SVM, DT, NB, and KNN, to evaluate the ability of models based on various VOCs to distinguish gastric cancer, PD and HC groups. Based on tenfold cross-validation, the optimal parameters of the model were determined using the validation set, and the predictive performance was assessed using the independent test set. The performance of the models was evaluated based on accuracy, precision, specificity, recall and F1-score. A receiver operating characteristic (ROC) curve was constructed to assess the performance of VOCs in the differential diagnosis of gastric cancer and non-cancer patients. In this study, an SVM model with an RBF kernel was trained, and the importance of VOCs was estimated using the varImp function. A forward selection strategy was then utilized to determine the combination, which was defined as the subset achieving the highest classification accuracy with the minimum number of variables.
All data were analyzed using IBM SPSS Statistics (version 22.0.0, IBM Corp., Armonk, N.Y., USA; https://www.ibm.com/products/spss-statistics) and R software (version 4.2.0, Vienna, Austria; https://www.r-project.org) and were expressed as mean ± standard deviation or median and interquartile range (IQR). A significance level of p-values < 0.05 was adopted for all tests.
Supplementary Information
Below is the link to the electronic supplementary material.
Author contributions
Y.Z. (Yuxiao Zhao) and Y.X. wrote the main manuscript text and performed formal analyses, M.M. and N.N. conducted investigations and software work, X.Z. (Xin Zheng) and T.L. carried out methodology development and validation, X.Z. (Xin Zhang) and Y.Z. (Yi Zhang) contributed to manuscript review and editing, H.B.S. performed visualization and data curation, C.S. and F.L. provided resources and data curation, Y.Z. (Yanli Zhang) and Y.Z. (Yi Zhang) supervised the work and acquired funding, and all authors reviewed and approved the final manuscript.
Funding
This work was supported by the Natural Science Foundation of Shandong Province (ZR2021MH110), the National Natural Science Foundation of China (82572647), and the Taishan scholar program of Shandong Province (tstp20221156, tsqn202306346).
Data availability
The datasets supporting the conclusions of this article are included within the article and its additional files.
Declarations
Ethical approval
The experimental protocol was established according to the ethical guidelines of the Helsinki Declaration and was approved by the Ethics Committee of Qilu Hospital of Shandong University (KYLL-202411-048-1). Informed consent was obtained from all eligible subjects.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Yuxiao Zhao, Yueming Xin and Mai Mao contributed equally to this work.
Contributor Information
Yanli Zhang, Email: zyl_2960@126.com.
Yi Zhang, Email: yizhang@sdu.edu.cn.
References
- 1.Siegel, R. L., Miller, K. D., Wagle, N. S. & Jemal, A. Cancer statistics, 2023. CA Cancer J. Clin.73, 17–48. 10.3322/caac.21763 (2023). [DOI] [PubMed] [Google Scholar]
- 2.Ugai, T. et al. Is early-onset cancer an emerging global epidemic? Current evidence and future implications. Nat. Rev. Clin. Oncol.19, 656–673. 10.1038/s41571-022-00672-8 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Amann, A. et al. The human volatilome: Volatile organic compounds (VOCs) in exhaled breath, skin emanations, urine, feces and saliva. J. Breath Res.8, 034001. 10.1088/1752-7155/8/3/034001 (2014). [DOI] [PubMed] [Google Scholar]
- 4.Zhang, J. et al. Breath volatile organic compound analysis: An emerging method for gastric cancer detection. J. Breath Res.10.1088/1752-7163/ac2cde (2021). [DOI] [PubMed] [Google Scholar]
- 5.Riccio, G., Baroni, S., Urbani, A. & Greco, V. Mapping of urinary volatile organic compounds by a rapid analytical method using gas chromatography coupled to ion mobility spectrometry (GC–IMS). Metabolites10.3390/metabo12111072 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Shen, X. et al. The detection of serum-volatile organic compounds in the diagnostics of hepatocellular carcinoma using gas chromatography-ion mobility spectrometry. Front. Chem.13, 1672220. 10.3389/fchem.2025.1672220 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Qian, J., Liu, Q., Wang, J., Zhuang, X. & Fang, J. Identifying novel biomarkers for biliary tract cancer based on volatile organic compounds analysis and machine learning. Front. Oncol.15, 1572460. 10.3389/fonc.2025.1572460 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Liu, Q. et al. Serum-volatile organic compounds in the diagnostics of esophageal cancer. Sci. Rep.14, 17722. 10.1038/s41598-024-67818-9 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Tong, H. et al. Volatile organic metabolites identify patients with gastric carcinoma, gastric ulcer, or gastritis and control patients. Cancer Cell Int.17, 108. 10.1186/s12935-017-0475-x (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Chen, J. et al. Exhaled volatolomics profiling facilitates personalized screening for gastric cancer. Cancer Lett.590, 216881. 10.1016/j.canlet.2024.216881 (2024). [DOI] [PubMed] [Google Scholar]
- 11.Filipiak, W. et al. Dependence of exhaled breath composition on exogenous factors, smoking habits and exposure to air pollutants. J. Breath Res.6, 036008. 10.1088/1752-7155/6/3/036008 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Lubes, G. & Goodarzi, M. GC-MS based metabolomics used for the identification of cancer volatile organic compounds as biomarkers. J. Pharm. Biomed. Anal.147, 313–322. 10.1016/j.jpba.2017.07.013 (2018). [DOI] [PubMed] [Google Scholar]
- 13.Martínez-Moral, M. P., Tena, M. T., Martín-Carnicero, A. & Martínez, A. Highly sensitive serum volatolomic biomarkers for pancreatic cancer diagnosis. Clin. Chim. Acta.557, 117895. 10.1016/j.cca.2024.117895 (2024). [DOI] [PubMed] [Google Scholar]
- 14.Bhatt, A. et al. Volatile organic compounds in plasma for the diagnosis of esophageal adenocarcinoma: A pilot study. Gastrointest. Endosc.84, 597–603. 10.1016/j.gie.2015.11.031 (2016). [DOI] [PubMed] [Google Scholar]
- 15.Shimada, H., Noie, T., Ohashi, M., Oba, K. & Takahashi, Y. Clinical significance of serum tumor markers for gastric cancer: A systematic review of literature by the task force of the Japanese gastric cancer association. Gastric Cancer17, 26–33. 10.1007/s10120-013-0259-5 (2014). [DOI] [PubMed] [Google Scholar]
- 16.Pinto, J. et al. Urinary volatilomics unveils a candidate biomarker panel for noninvasive detection of clear cell renal cell carcinoma. J. Proteome Res.20, 3068–3077. 10.1021/acs.jproteome.0c00936 (2021). [DOI] [PubMed] [Google Scholar]
- 17.Zhang, X. et al. A panel of bile volatile organic compounds servers as a potential diagnostic biomarker for gallbladder cancer. Front. Oncol.12, 858639. 10.3389/fonc.2022.858639 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Bouza, M., Gonzalez-Soto, J., Pereiro, R., de Vicente, J. C. & Sanz-Medel, A. Exhaled breath and oral cavity VOCs as potential biomarkers in oral cancer patients. J. Breath Res.11, 016015. 10.1088/1752-7163/aa5e76 (2017). [DOI] [PubMed] [Google Scholar]
- 19.Kok, R. et al. Breath biopsy, a novel technology to identify head and neck squamous cell carcinoma: A systematic review. Oral Dis.29, 3034–3048. 10.1111/odi.14305 (2023). [DOI] [PubMed] [Google Scholar]
- 20.Wu, D. et al. Aldehyde dehydrogenase 3A1 is robustly upregulated in gastric cancer stem-like cells and associated with tumorigenesis. Int. J. Oncol.49, 611–622. 10.3892/ijo.2016.3551 (2016). [DOI] [PubMed] [Google Scholar]
- 21.Jelski, W. & Szmitkowski, M. Alcohol dehydrogenase (ADH) and aldehyde dehydrogenase (ALDH) in the cancer diseases. Clin. Chim. Acta.395, 1–5. 10.1016/j.cca.2008.05.001 (2008). [DOI] [PubMed] [Google Scholar]
- 22.Monedeiro, F., Monedeiro-Milanowski, M., Zmysłowski, H., De Martinis, B. S. & Buszewski, B. Evaluation of salivary VOC profile composition directed towards oral cancer and oral lesion assessment. Clin. Oral Investig.25, 4415–4430. 10.1007/s00784-020-03754-y (2021). [DOI] [PubMed] [Google Scholar]
- 23.Amin, M. B. et al. The Eighth Edition AJCC cancer staging manual: Continuing to build a bridge from a population-based to a more “personalized” approach to cancer staging. CA Cancer J. Clin.67, 93–99. 10.3322/caac.21388 (2017). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The datasets supporting the conclusions of this article are included within the article and its additional files.





