Abstract
Background
Lung cancer remains the leading cause of cancer-related deaths worldwide, with early detection significantly improving survival. Lung nodules are a common finding, both as incidental solitary pulmonary nodules (SPNs) and in lung cancer screening programmes. Accurately distinguishing benign from malignant nodules, particularly small ones, remains challenging. Determining which nodules require further investigation is crucial for optimising early lung cancer detection and reducing unnecessary procedures. Therefore, volatile organic compounds (VOCs) in exhaled breath are analysed using multicapillary column/ion mobility spectrometry (MCC/IMS) to differentiate malignant from benign SPNs, serving as potential biomarkers for lung cancer.
Methods
A total of 65 patients with an incidental, solitary pulmonary malignant (n=41) or benign (n=24) nodule were prospectively included. Two models were developed: a pre-computed tomography (CT) scan situation for triaging high-risk individuals prior to imaging, and a post-CT scan situation to assist with nodule management and follow-up.
Results
Four VOCs (VOC37, VOC46, VOC58 and VOC128) were identified as key compounds, showing strong diagnostic performance (area under the curve 0.900 for pre-CT scan and 0.897 for post-CT scan).
Conclusions
Our findings suggest that breath analysis could improve nodule management by refining patient selection for CT screening and reducing unnecessary invasive procedures or follow-up scans. Further validation through larger multicentre studies is needed to confirm these results.
Shareable abstract
Breath analysis using VOCs measured by MCC/IMS shows high accuracy in distinguishing malignant from benign lung nodules, offering a noninvasive tool to improve early lung cancer detection, and reduce unnecessary imaging and procedures https://bit.ly/45Iis71
Introduction
Lung cancer (LC) is the leading cause of cancer-related deaths, responsible for nearly one-quarter of all cases worldwide [1]. Early-stage disease has a 5-year survival rate of up to 80%, but it is often asymptomatic, leading to late diagnoses in 75% of cases [2]. Screening with low-dose computed tomography (LDCT) has improved early detection and reduced LC mortality by 20–26% [3, 4].
Currently, eligibility for LDCT screening is based on risk factor questionnaires [3, 4] and predictive models [5–7]. These models typically incorporate clinical characteristics such as age, body mass index (BMI) and pack-years, which have been consistently identified as important predictors of LC risk. The National Lung Screening Trial (NLST) [3] and the Liverpool Lung Project (LLP) risk model [5] both highlight age and smoking history (pack-years) as significant factors in risk stratification. Furthermore, the HUNT study [6] and Tammemagi et al. [7] have emphasised that BMI can also contribute to LC risk, with a higher BMI being associated with a greater risk of malignancy in certain cohorts.
However, LDCT frequently detects small nodules of unknown significance, leading to a low positive predictive value (PPV) (2.4–5.2%) and a high false-positive rate. Nearly one-quarter of participants received false-positive results over three rounds of screening [5], while overdiagnosis was estimated at 18% [8].
Solitary pulmonary nodules (SPNs) are commonly detected in both screening and routine chest CT scans, with incident nodules found in 31% of patients and up to 51% of smokers [9, 10]. SPNs, defined as lung opacities up to 3 cm, have a broad differential diagnosis, including benign and malignant conditions [11]. Risk stratification is based on clinical parameters and imaging features, but small nodules remain challenging to characterise due to overlapping morphologic traits [12]. Current guidelines recommend CT follow-up for pulmonary nodules, with nodule size being a key factor in determining the need for further imaging. However, since many of these nodules are benign, this often leads to unnecessary imaging, invasive procedures and increased patient anxiety [13, 14].
Exhaled breath contains volatile organic compounds (VOCs) that reflect metabolic activity, including processes linked to malignancy, such as inflammation and oxidative stress. To optimise nodule management, both before and after detection on CT scans, exhaled breath analysis could be considered [15].
Previous studies examined VOCs across various human matrices as potential LC biomarkers [16] and breath analysis has been explored as a noninvasive tool to differentiate LC patients from healthy controls [17–23]. Additionally, our group previously showed that breath analysis could discriminate patients with LC from asbestos-exposed individuals and patients with mesothelioma [24, 25]. Therefore, it is expected that breath analysis could also aid in classifying nodules, which could result in less patient distress, decreased use of invasive diagnostic procedures and fewer (follow-up) CT scans or an increased time interval between follow-up examinations, subsequently reducing radiation dose.
In this discovery study, we aim to determine the feasibility of using breath analysis in a pre- and post-CT scan setting in the process of evaluating incidental nodules for LC. The pre-CT scan model focuses on triaging individuals who might require a CT scan based on their risk factors, refining the selection of participants for screening. Meanwhile, the post-CT scan model aims to help with monitoring and follow-up of previously detected nodules, distinguishing between benign and malignant growths, and potentially reducing the frequency of invasive diagnostic procedures and radiation exposure.
Materials and methods
Study design and participants
This prospective study adhered to the Helsinki Convention and was approved by the Antwerp University Hospital Ethics Committee (Belgian registration number B300201837007). Inclusion rate was based on sample size estimation using power analysis to detect a significant difference in VOC profiles (α=0.05, power=80%). Participants were referred for inclusion after discussion in the Multidisciplinary Thoracic Oncology Tumour Board if they presented with an SPN>6 mm or, if smaller, exhibited radiological features strongly suggestive of malignancy, including spiculation, irregular margins, high attenuation, growth or solid components. Exclusion criteria included fully calcified nodules, prior sarcoma, renal cell carcinoma, melanoma, immunosuppression or suspected extra-thoracic metastases. Malignancy was confirmed histologically, whereas benignity was established through stability on follow-up imaging per international guidelines [13, 14].
Informed consent was obtained, and baseline clinical data (age, sex, BMI, smoking history and pack-years) were collected. Follow-up CT scans were conducted at the discretion of the treating physician.
Breath sampling and analysis
Breath sampling followed European recommendations for exhaled VOC collection [26]. Participants fasted and refrained from smoking and oral hygiene activities for 2 h before sampling. Prior to sampling, they rinsed their mouths with distilled water and wore gloves and a nose clip during the sampling procedure. Sampling used a multicapillary column/ion mobility spectrometry (MCC/IMS; BreathDiscovery Second Generation, B. Braun, Dortmund, Germany) coupled with an integrated capnography-based breath sampler (SpiroScout, Ganshorn Medizin Electronic, Germany) [27, 28].
Participants breathed normally for 3 min through a ScoutTube mouthpiece connected to bacterial and viral filters and the MCC/IMS sample loop. A 10-mL alveolar air sample was collected and transferred to a nonpolar OV-5 multicapillary column. VOCs were ionised via a 95 MBq 63Ni-radiation source and separated in a drift tube using α1-nitrogen gas (Air Liquide, Belgium). Ion mobility properties were used for pseudo-identification via a Faraday plate detector. A matched 10-mL background air sample was collected for ambient air correction. MCC/IMS was flushed between participants, and disposable components minimised contamination.
Data processing
Breath and background samples were analysed using VisualNow software (v.3.9, B. Braun, Dortmund, Germany). Raw two-dimensional (2D) chromatograms were aligned, denoised (5×3 low-pass filter), normalised to the reactant ion peak (RIP) and corrected for RIP-tailing. VOCs were manually selected and their peak intensities were extracted. The alveolar gradient (breath sample minus background) was used as the predictor variable for statistical and machine-learning models.
Tentative VOC identification was performed by overlaying chromatograms with a reference library of known compounds provided by the manufacturer, based on retention time and ion mobility spectra.
Statistics and machine-learning models
Normality was assessed using the Shapiro–Wilk test. Normally distributed variables (p>0.05) were compared using independent t-tests, and non-normally distributed variables (p≤0.05) were analysed using Wilcoxon rank-sum tests. Features with p<0.1 were included in multivariate modelling.
Two feature selection methods were compared: backward logistic regression and penalised logistic regression. Backward logistic regression, performed using the MASS R-package (v.7.3-60), used the Akaike information criterion for stepwise feature elimination. Penalised logistic regression, implemented with the glmnet R-package (v.4.1-7), applied LASSO with leave-one-out cross-validation (LOOCV), considering features selected in more than 80% of LOOCV folds as robust. Only VOCs retained by both methods were included in the final model.
In the pre-CT scan model, the selected VOCs were combined with clinical characteristics, including age, BMI and pack-years, to predict malignancy before imaging. These characteristics have been widely used in risk stratification models for LC screening, such as those developed in the National Lung Screening Trial (NLST) [3], the LLP risk model [5] and others [6, 7]. Age and smoking history (pack-years) are consistently cited as strong predictors of LC risk, whereas BMI has also been shown to correlate with cancer susceptibility, particularly in certain populations.
The post-CT scan model combined VOCs with nodule size, a widely used clinical parameter for determining the need for follow-up, to enhance malignancy assessment and minimise unnecessary follow-up procedures [13, 14].
Receiver operating characteristic (ROC) curves were generated using the ROCR R-package (v.1.0-11) to evaluate model performance. Sensitivity, specificity, positive and negative predictive values (PPVs/NPVs), accuracy and area under the curve (AUC) were estimated. The optimal cut-off was determined using Youden's index [29].
Likelihood ratios (LR+ and LR−) were calculated to assess diagnostic value, with a higher LR+ increasing the post-test probability (PTP) of malignancy and a lower LR− reinforcing the likelihood of benignity [30]. Boxplots were used to visualise the distribution of VOCs across benign and malignant groups.
Spearman's correlations examined associations between VOC intensity and continuous variables, including age, BMI, nodule size and pack-years. Correlograms were plotted using the corrplot R-package (v.0.92). Two-way ANOVA was performed to investigate relationships between VOC concentrations and categorical variables. Continuous variables were compared using the Wilcoxon rank-sum test, and categorical variables were analysed using Pearson's chi-squared test.
Results
Patient characteristics
A total of 65 patients with an incidental, solitary pulmonary nodule were prospectively included from 2 May 2022 until 2 December 2024. Of the 65 pulmonary nodules, 24 (36.92%) were malignant after tissue examination, 40 (61.54%) were benign based on clinical parameters such as morphology in serial follow-up, and 1 (1.54%) was proven benign through tissue examination. table 1 presents a comparison of various clinical characteristics between patients with benign and malignant nodules. Significant differences were observed in age (p=0.032), smoking status (p=0.031), nodule size (p<0.001) and duration of follow-up (p=0.005). Malignant nodules were more common in older patients, current smokers, larger nodules and a shorter follow-up duration. No significant differences were found for BMI (p=0.813), pack-years (p=0.106) or sex (p=0.168).
TABLE 1.
Overview of the baseline clinical characteristics of the included participants
| Benign nodule | Malignant nodule | p-value | |
|---|---|---|---|
| Subjects | 41 | 24 | |
| Female | 22.67% | 20.00% | 0.168# |
| Male | 77.33% | 80.00% | |
| Age, years | 64.33 (56.38–69.45) | 70.18 (63.77–76.41) | 0.032¶ |
| BMI, kg·m−2 | 25.95 (23.18–28.01) | 24.54 (22.90–29.84) | 0.813¶ |
| Smoking status | 0.031# | ||
| Never | 20 (48.78%) | 4 (16.67%) | |
| Current | 4 (9.76%) | 5 (20.83%) | |
| Former | 17 (41.46%) | 15 (62.50%) | |
| Pack-years | 0.75 (0.00–30.00) | 21.50 (0.45–33.50) | 0.106¶ |
| SCLC | NA | 0% | |
| NSCLC | |||
| AC | NA | 72.73% | |
| SCC | NA | 36.37% | |
| LCC | NA | 0% | |
| Nodule size, mm | 9.00 (6.00–12.00) | 17.50 (13.00–23.75) | <0.001¶ |
| Duration of follow-up, years | 2.34 (1.35–2.93) | 0.68 (0.17–1.48) | 0.005¶ |
Data are presented as n, n (%) or median (Q1–Q3). BMI: body mass index; SCLC: small-cell lung cancer; NA: not applicable; NSCLC: non-small-cell lung cancer; AC: adenocarcinoma; SCC: squamous cell carcinoma; LCC: large cell carcinoma. #: Pearson's chi-squared test. ¶: Mann–Whitney U-test.
Feature selection: LASSO and backward regression
A total of 134 VOCs were detected from breath and background samples. Investigators were blinded to the CT results and radiographical high-risk features during VOC analysis. Thirteen VOCs were significantly altered between groups, as determined by a Mann–Whitney U-test. Following this, feature selection using backward logistic regression and penalised logistic regression with LOOCV was performed to identify the most influential VOCs (table 2). Four VOCs (VOC37, VOC46, VOC58 and VOC128), were selected for further analysis due to their overlap across methods. The selected VOCs were retained for the final binary classification model (supplementary table S1).
TABLE 2.
Specifications of identified volatile organic compounds
| Variable | RT (s) | 1/K0, V·cm−2 | Tentative identification | Functional group | Alveolar gradient | p-value | β-coefficient | p-value | |
|---|---|---|---|---|---|---|---|---|---|
| Benign nodule | Malignant nodule | ||||||||
| VOC5 | 34.5 | 0.592 | D-limonene | Terpene | 0.004 (0.002–0.008) | 0.007 (0.004–0.008) | 0.096# | 92.941 | 0.054+ |
| VOC36 | 7.9 | 0.714 | 1-Butanol | Alcohol | 0.006 (0.002–0.007) | 0.009 (0.005–0.015) | 0.008# | 180.248 | 0.003+ |
| VOC37 | 6.1 | 0.711 | Pentanal | Aldehyde | 0.004 (0.002–0.007) | 0.009 (0.004–0.015) | 0.005¶ | 179.593 | 0.004+ |
| VOC40 | 6.1 | 0.650 | 2-Methylfuran | Heterocyclic ether | 0.007 (0.006–0.011) | 0.010 (0.006–0.014) | 0.084¶ | 70.616 | 0.096+ |
| VOC46 | 11.6 | 0.458 | Ammonia | Inorganic compound | 0.015 (0.003–0.028) | 0.001 (0.004–0.011) | 0.003¶ | −43.039 | 0.011+ |
| VOC58 | 106.1 | 0.708 | (1α,2β,5α)-5-Methyl-2-(1-methylethyl)cyclohexanol | Alcohol | 0.001 (0.000–0.002) | 0.002 (0.001–0.003) | 0.006¶ | 473.414 | 0.011+ |
| VOC80 | 7.1 | 0.550 | 2-Pentanone | Ketone | 0.039 (0.005–0.060) | 0.075 (0.043–0.136) | 0.001¶ | 19.896 | 0.002+ |
| VOC89 | 7.3 | 0.559 | Dimethyl disulfide | Disulfide | 0.013 (0.002–0.022) | 0.022 (0.011–0.042) | 0.018# | 31.979 | 0.016+ |
| VOC112 | 5.2 | 0.608 | 2-Butanone | Ketone | 0.010 (0.005–0.013) | 0.014 (0.007–0.019) | 0.052¶ | 51.823 | 0.045+ |
| VOC116 | 20.2 | 0.540 | Phenol | Phenol | 0.006 (0.004–0.012) | 0.010 (0.007–0.015) | 0.033# | 30.118 | 0.191+ |
| VOC126 | 111.7 | 0.718 | 1-Dodecene | Alkene | 0.005 (0.002–0.007) | 0.007 (0.005–0.010) | 0.056¶ | 140.638 | 0.055+ |
| VOC128 | 8.6 | 0.549 | 3-Pentanone | Ketone | 0.039 (0.007–0.063) | 0.075 (0.046–0.136) | 0.001# | 20.909 | 0.001+ |
| VOC131 | 12.3 | 0.540 | Phenylacetylene | Alkyne | 0.033 (0.017–0.049) | 0.049 (0.034–0.072) | 0.077# | 6.861 | 0.355+ |
Data are presented as median (Q1–Q3). VOC: volatile organic compound; RT: retention time; 1/K0: inverse reduced ion mobility. #: Mann–Whitney U-test. ¶: Independent two-sample t-test. +: Univariate logistic regression.
The alveolar gradient of VOC37, VOC58 and VOC128 was significantly higher in patients with malignant nodules compared with those with benign nodules, whereas VOC46 showed a significantly higher alveolar gradient in benign nodules (figure 1).
FIGURE 1.
Visual representation of the alveolar gradient of volatile organic compounds (VOCs) selected by both backward and penalised regression, namely a) VOC37, b) VOC46, c) VOC58 and d) VOC128. p-values were calculated using an independent t-test or Wilcoxon test. **: p<0.01; ***: p<0.001.
Pre-CT scan and post-CT scan models
The predictive performance of the pre-CT scan and post-CT scan models was evaluated using VOCs combined with clinical data and ROC curves were generated (figure 2). The pre-CT scan model combined VOCs with clinical features, including age, BMI and pack-years. The post-CT scan model used VOCs combined with nodule size. The pre-CT scan model demonstrated an accuracy of 0.831, with a sensitivity of 0.792, specificity of 0.854 and AUC of 0.900 (table 3). Following a positive pre-CT scan test, the post-test PTP of malignancy increased to 0.760, while the probability of being disease-free decreased to 0.125 (supplementary table S2).
FIGURE 2.
Receiver operating characteristic (ROC) curves demonstrating the ability to detect malignancy a) in the pre-computed tomography (CT) scan model and b) in the post-CT scan model. The diagonal grey line represents an uninformative test corresponding to a random chance diagnosis. p-values were calculated by DeLong's test for comparing the areas under two correlated ROC curves and are indicated by a grey bar. *: p<0.05; **: p<0.01. VOC: volatile organic compound.
TABLE 3.
Overview of the performance characteristics of the different prediction models
| Selected VOCs | Pre-CT features | Post-CT feature | Pre-CT scan model | Post-CT scan model | |
|---|---|---|---|---|---|
| Accuracy | 0.862 (0.757–0.925) | 0.662 (0.540–0.765) | 0.800 (0.687–0.879) | 0.831 (0.722–0.903) | 0.908 (0.813–0.957) |
| Sensitivity | 0.625 (0.427–0.788) | 0.708 (0.508–0.851) | 0.792 (0.595–0.908) | 0.792 (0.595–0.908) | 0.792 (0.595–0.908) |
| Specificity | 1.000 (0.914–1.000) | 0.634 (0.481–0.764) | 0.805 (0.660–0.898) | 0.854 (0.716–0.931) | 0.976 (0.874–0.996) |
| PPV | 1.000 (0.796–1.000) | 0.531 (0.365–0.691) | 0.704 (0.515–0.842) | 0.760 (0.566–0.885) | 0.950 (0.764–0.991) |
| NPV | 0.820 (0.692–0.902) | 0.788 (0.623–0.893) | 0.868 (0.727–0.943) | 0.875 (0.739–0.945) | 0.889 (0.765–0.952) |
| AUC | 0.860 (0.762–0.860) | 0.675 (0.538–0.675) | 0.854 (0.762–0.854) | 0.900 (0.829–0.900) | 0.897 (0.802–0.897) |
Data are presented as percentages with 95% confidence interval. Pre-CT features: Age, BMI and pack-years; Post-CT features: nodule size. VOC: volatile organic compound; CT: computed tomography; PPV: positive predictive value; NPV: negative predictive value; AUC: area under the curve.
For the post-CT scan model, the accuracy was 0.908, with a sensitivity of 0.792, specificity of 0.976 and an AUC of 0.897. The post-CT scan model showed an increase in the positive PTP of malignancy to 0.950 following a positive test and a decrease of the negative PTP to 0.111.
DeLong's test revealed significant differences in AUCs between the pre-CT scan model and the model based on pre-CT clinical features (age, BMI and pack-years; p=0.0014). No significant differences were observed between the pre-CT scan model and the model based on VOCs alone (p=0.5127). For the post-CT scan model, no significant differences were found when compared with the model based on post-CT clinical features (nodule size; p=0.2508) or the model based on VOCs alone (p=0.5898).
To determine the clinical value of the pre-CT scan model, we selected a subpopulation of participants that would have been eligible for screening, namely smokers or former smokers aged 55–74 years with specific smoking histories [3]. A total of 17 participants were deemed eligible and model performance for this subgroup is shown in table 4. Their clinical characteristics can be found in supplementary table S3.
TABLE 4.
Overview of the performance characteristics of the pre-computed tomography scan model after selection of a subpopulation that is eligible for screening
| Accuracy | 0.800 (0.490–0.943) |
| Sensitivity | 1.000 (0.646–1.000) |
| Specificity | 1.000 (0.676–1.000) |
| PPV | 0.778 (0.453–0.937) |
| NPV | 0.882 (0.657–0.967) |
| AUC | 0.900 (0.748–1.000) |
Data are presented as percentages with 95% confidence interval. PPV: positive predictive value; NPV: negative predictive value; AUC: area under the curve.
Correlation between clinical characteristics and VOCs
Correlations between continuous clinical characteristics and selected VOCs were assessed and are presented in figure 3. Age was positively associated with VOC128 (r=0.26, p=0.033) and negatively with VOC46 (r= −0.09, p=0.465). Nodule size correlated positively with VOC37 (r=0.31, p=0.011) and VOC128 (r=0.25, p=0.044), and negatively with VOC46 (r= −0.33, p=0.008). VOC46 was negatively correlated with VOC128 (r= −0.37, p=0.002). No significant associations were found between smoking status or sex and the identified VOCs (supplementary table S4).
FIGURE 3.

Correlation plot using Spearman's correlation test of clinical variables measured in the study population. Rank order correlation values are shown from green (−1.0) to blue (1.0). Values are indicated by colour. *: p<0.05; **: p<0.01; ***: p<0.001. BMI: body mass index; VOC: volatile organic compound.
Discussion
Breathomics has gained attention as a promising field for detecting various diseases, including LC, through VOCs. To mitigate the issue of excessive follow-up of incidentally found pulmonary nodules due to false-positive results and to improve triage decisions on whether a CT scan is necessary prior to nodule detection, this study evaluated the feasibility of breath analysis as a supplemental diagnostic tool with pre- and post-CT scan models (figure 4).
FIGURE 4.
Visualisation of the use of the pre- and post-computed tomography (CT) scan models in the flow of nodule detection through both lung cancer screening and non-screening clinical use of imaging. BMI: body mass index.
Breath samples from 65 patients with incidental SPNs were analysed for VOCs using MCC/IMS, a fast, sensitive, and portable method suited for clinical use [31, 32]. VOC37, VOC46, VOC58 and VOC128 were selected after feature selection using penalised and backward regression, offering a potential biomarker panel for nodule patients.
The pre-CT scan model, combining clinical factors (age, smoking and BMI) with VOCs, demonstrated strong predictive performance (AUC, 0.90). This was a significant improvement over the clinical model alone (AUC, 0.67), highlighting the added value of breath analysis. The PTPs further reinforced its clinical relevance: a positive result substantially increased the likelihood of malignancy (PTP, 0.760), prioritising high-risk individuals for further diagnostic workup. Conversely, the negative PTP (0.125) indicated that a negative result substantially reduced, but did not entirely rule out, malignancy. Given its ability to integrate easily obtainable clinical parameters with a noninvasive breath test, this model could be particularly useful in primary care or resource-limited settings, where CT scans may not be readily accessible. By improving risk stratification before imaging, this approach has the potential to reduce unnecessary scans while ensuring that high-risk individuals are promptly identified for further investigation. As PTP depend on disease prevalence, the utility of our breath test varies across populations. In a high-risk group, such as long-term smokers (with a LC prevalence of 1–2%), a positive result increases the probability of malignancy significantly. In contrast, in the general population (0.12% prevalence [33]), the positive PTP would be lower (≈0.31%). Thus, our breath test is most effective in high-risk groups where it can help prioritise individuals for CT screening.
For the post-CT scan setting, where an SPN was already detected via imaging, VOCs were incorporated alongside nodule size to create a model that further refined malignancy risk assessment. Nodule size, rather than morphology, was chosen as the supplementary parameter to exhaled VOCs, as SPNs that are considered suspicious based on their morphology will inevitably require follow-up with CT or additional diagnostics. With the post-CT model, we aim to address the clinical gap for nodules that are too large to be dismissed but do not yet exhibit other suspicious characteristics. These nodules necessitate extensive follow-up, despite only a small fraction being malignant.
The VOC-only model and the nodule size model performed similarly, but the combination of both improved diagnostic accuracy to an AUC of 0.90. Although this improvement was not statistically significant, the model demonstrated increased sensitivity and specificity compared with using nodule size alone, indicating that VOC analysis provides valuable complementary information. The post-test probabilities supported this finding, as a positive result yielded a very high probability of malignancy (0.950), offering strong reassurance in confirming the disease. Meanwhile, a negative result reduced the likelihood of malignancy to 0.111, suggesting a reasonable level of confidence in ruling out cancer. These findings highlight the potential of VOC-based breath analysis as an adjunct to imaging, enhancing the accuracy of malignancy predictions beyond what nodule size alone can achieve. This is particularly relevant in the context of emerging AI-assisted imaging tools, which have shown promise in improving lung nodule classification and reducing interobserver variability among radiologists [34–36]. While AI-driven imaging models continue to advance, VOC analysis offers unique metabolic insights that imaging alone may miss, reinforcing its role as a complementary diagnostic tool in LC assessment.
As VOCs are as effective as nodule size in the post-CT scan situation and AI has proven to improve diagnostic accuracy after CT, the added value of breath analysis likely lies in the upfront selection of persons fit for screening. Therefore, our study included a subgroup of participants eligible for LC screening [3]. The pre-CT scan model for this subgroup yielded high sensitivity (100%), specificity (100%) and AUC (0.900). This suggests that VOC analysis, when combined with LC screening criteria, could effectively enrich the screening population, improving sensitivity and reducing false-positive rates.
Recent studies have shown the potential of VOCs in exhaled breath as biomarkers for LC detection [17–23]. For instance, Chen et al. [17] identified 20 VOCs that could distinguish LC from healthy controls with an AUC of 0.987, highlighting the promise of breath analysis in early-stage cancer detection. Similarly, Fu et al. [23] demonstrated that carbonyl VOCs such as 2-butanone and 3-hydroxy-2-butanone were significantly elevated in LC patients, showing potential for noninvasive diagnostics. Furthermore, Phillips et al. [18] introduced the MAGIIC biomarker in breath, which effectively predicted both LC and pulmonary nodules, with implications for guiding biopsy decisions and reducing unnecessary invasive procedures. In addition, Buma et al. [22] validated e-Nose technology for LC detection, showcasing its ability to discriminate cancer patients from noncancer individuals, even across disease stages, with high accuracy.
Building upon these studies, our research incorporates a novel combination of VOCs with clinical factors (e.g. age, smoking and BMI) to improve predictive accuracy for LC and pulmonary nodule malignancy. Unlike many studies that focus on pattern recognition or small sample sizes, our approach selects specific VOCs based on robust statistical methods such as penalised regression and backward regression, ensuring biological relevance and clinically actionable results. We also address one of the limitations of prior studies by incorporating background air sampling to minimise environmental influences, enhancing the specificity of our findings.
Despite promising results, several limitations must be acknowledged. First, this study was conducted at a single centre with a moderate sample size, which limits the generalisability of our findings. Larger, multicentre studies are needed for external validation. Significant differences in age, smoking status and nodule size between patient groups could have influenced the outcomes [37]. Although correlation analysis showed no significant associations between VOCs and smoking status or sex, VOC128 showed a positive correlation with age, suggesting that it may be more associated with this clinical feature than with cancer metabolism. Nodule size was correlated with three out of four selected VOCs, which aligns with the established relationship between larger nodules and higher malignancy risk. While this correlation may introduce a limitation, it also reflects expected tumour dynamics, consistent with clinical guidelines emphasising nodule size as a malignancy indicator. VOC58, however, was not correlated with nodule size, suggesting that it may arise primarily from metabolic changes rather than tumour volume, supporting its role as a diagnostic marker. While groups were matched for sex, BMI and pack-years, the uneven distribution of some variables, such as a higher proportion of males in the individual groups, could introduce bias.
Breathomics in general faces the challenge of standardisation. While we have made significant efforts to reduce experimental variability, such as implementing strict pre-sample collection protocols and minimising environmental influences, the generalisability of these results remains limited, as patient preparation and environmental control can vary significantly across studies.
Our study also faced challenges with the risk of overfitting due to the small sample size and uneven distribution of groups. To mitigate this, we used cross-validation and two statistical methods, lasso regression and backward regression, to confirm the robustness of our findings. Nevertheless, external validation with larger cohorts is essential to ensure clinical applicability. Furthermore, while we took steps to minimise environmental contamination by using inert sampling materials and calculating the alveolar gradient of VOCs, some environmental influences may still have affected the results [38].
Another limitation was that the MCC/IMS method provides “pseudo-identifications” of VOCs, which means that the exact chemical identities of the VOCs remain unknown. However, the tentative identification of compounds is possible. This limitation is similar to what Gordon et al. [39] discussed in their work on VOC analysis for LC detection, where they emphasised that focusing on a single or small set of VOCs might not be as effective as using a comprehensive VOC fingerprinting approach. Our study aligns with this principle by adopting a broader, multiparametric approach to VOC analysis.
Pentanal (VOC37) was only present in LC patients compared with healthy nonsmokers in previous research and has been linked to oxidative stress and inflammation [40, 41]. Similarly, ammonia (VOC46) has been associated with liver dysfunction and urea cycle dysregulation in cancer [42]. VOC58 (5-methyl-2-(1-methylethyl)-1,2,5-cyclohexanol) may influence cancer cell signalling [43] and VOC128 (3-pentanone), related to lipid metabolism and energy production, has been linked to cancer cell metabolism [44]. These findings align with previous research, but further validation using more precise methods like GC–MS is needed for definitive identification.
Regarding the practicality of implementing breath analysis, the MCC/IMS device is compact and offers real-time analysis, but the breath sampling procedure in this study followed European recommendations, which included restrictions such as fasting, refraining from smoking and avoiding oral hygiene. While these measures improve accuracy, they may limit the test's practical use in large-scale screenings or primary care settings.
In conclusion, our study demonstrates the feasibility of using breath analysis for distinguishing malignant from benign SPNs. The pre-CT scan model, combining VOCs with clinical factors, offers a promising tool for triaging patients, while the post-CT scan model enhances diagnostic accuracy alongside imaging. Despite some limitations, our findings support the potential of VOCs as a diagnostic tool in LC management, and future research should focus on multicentre validation.
Acknowledgments
We would like to express our gratitude to all our co-workers at the Antwerp University Hospital. We also want to extend our thanks to the patients and participants for their valuable contributions to this research.
Footnotes
Provenance: Submitted article, peer reviewed.
Ethics statement: The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of the Antwerp University Hospital (Belgian registration number B300201837007) on 12 April 2022. Informed consent was obtained from all subjects involved in the study. Written informed consent has been obtained from the patients to publish this article.
Author contributions: Conceptualisation: K. Lamote, J.P. van Meerbeeck, A. Janssens and A. Snoeckx; methodology: K. Zwijsen, R.R.L. Wener, K. Lamote, A. Janssens, A. Snoeckx and J.P. van Meerbeeck; formal analysis: K. Zwijsen and R.R.L. Wener; investigation: K. Zwijsen, R.R.L. Wener, E. Heirwegh, E. Schillebeeckx, P.E. van Schil and J. Raskin; data curation: K. Zwijsen and E. Schillebeeckx; writing (original draft preparation): K. Zwijsen; writing (review and editing): K. Zwijsen, R.R.L. Wener, E. Heirwegh, E. Schillebeeckx, J. Raskin, E. Marcq, J.P. van Meerbeeck, P.E. van Schil, A. Snoeckx, A. Janssens and K. Lamote; visualisation: K. Zwijsen; supervision: A. Janssens, J.P. van Meerbeeck and K. Lamote; funding acquisition: K. Lamote, A. Janssens and J.P. van Meerbeeck. All authors have read and agreed to the published version of the manuscript.
This article has an editorial commentary: https://doi.org/10.1183/23120541.01235-2025
Conflict of interest: The authors declare no conflict of interest.
Support statement: This research was funded by the Stichting Tegen Kanker (Foundation Against Cancer) (convention number 2020-124). The funder had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. Funding information for this article has been deposited with the Open Funder Registry.
Supplementary material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Supplementary material
00827-2025.SUPPLEMENT
References
- 1.Siegel RL, Miller KD, Fuchs HE, et al. Cancer statistics, 2021. CA Cancer J Clin 2021; 71: 7–33. doi: 10.3322/caac.21654 [DOI] [PubMed] [Google Scholar]
- 2.Kauczor HU, Bonomo L, Gaga M, et al. ESR/ERS white paper on lung cancer screening. Eur Respir J 2015; 46: 28–39. doi: 10.1183/09031936.00033015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.National Lung Screening Trial Research Team , Aberle DR, Adams AM, Berg CD, et al. Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med 2011; 365: 395–409. doi: 10.1056/NEJMoa1102873 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.de Koning HJ, van der Aalst CM, de Jong PA, et al. Reduced lung-cancer mortality with volume CT screening in a randomized trial. N Engl J Med 2020; 382: 503–513. doi: 10.1056/NEJMoa1911793 [DOI] [PubMed] [Google Scholar]
- 5.Cassidy A, Myles JP, van Tongeren M, et al. The LLP risk model: an individual risk prediction model for lung cancer. Br J Cancer 2008; 98: 270–276. doi: 10.1038/sj.bjc.6604158 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Roe OD, Markaki M, Tsamardinos I, et al. ‘Reduced' HUNT model outperforms NLST and NELSON study criteria in predicting lung cancer in the Danish screening trial. BMJ Open Respir Res 2019; 6: e000512. doi: 10.1136/bmjresp-2019-000512 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Tammemagi MC, Katki HA, Hocking WG, et al. Selection criteria for lung-cancer screening. N Engl J Med 2013; 368: 728–736. doi: 10.1056/NEJMoa1211776 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Bonney A, Malouf R, Marchal C, et al. Impact of low-dose computed tomography (LDCT) screening on lung cancer-related mortality. Cochrane Database Syst Rev 2022; 8: CD013829. doi: 10.1002/14651858.CD013829.pub2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Gould MK, Tang T, Liu IL, et al. Recent trends in the identification of incidental pulmonary nodules. Am J Respir Crit Care Med 2015; 192: 1208–1214. doi: 10.1164/rccm.201505-0990OC [DOI] [PubMed] [Google Scholar]
- 10.Walter JE, Heuvelmans MA, Oudkerk M. Small pulmonary nodules in baseline and incidence screening rounds of low-dose CT lung cancer screening. Transl Lung Cancer Res 2017; 6: 42–51. doi: 10.21037/tlcr.2016.11.05 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Bankier AA, MacMahon H, Colby T, et al. Fleischner society: glossary of terms for thoracic imaging. Radiology 2024; 310: e232558. doi: 10.1148/radiol.232558 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Truong MT, Ko JP, Rossi SE, et al. Update in the evaluation of the solitary pulmonary nodule. Radiographics 2014; 34: 1658–1679. doi: 10.1148/rg.346130092 [DOI] [PubMed] [Google Scholar]
- 13.MacMahon H, Naidich DP, Goo JM, et al. Guidelines for management of incidental pulmonary nodules detected on CT images: from the Fleischner society 2017. Radiology 2017; 284: 228–243. doi: 10.1148/radiol.2017161659 [DOI] [PubMed] [Google Scholar]
- 14.Callister ME, Baldwin DR, Akram AR, et al. British Thoracic Society guidelines for the investigation and management of pulmonary nodules. Thorax 2015; 70: Suppl. 2, ii1–ii54. doi: 10.1136/thoraxjnl-2015-207168 [DOI] [PubMed] [Google Scholar]
- 15.Keogh RJ, Riches JC. The use of breath analysis in the management of lung cancer: is it ready for primetime? Curr Oncol 2022; 29: 7355–7378. doi: 10.3390/curroncol29100578 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Janssens E, van Meerbeeck JP, Lamote K. Volatile organic compounds in human matrices as lung cancer biomarkers: a systematic review. Crit Rev Oncol Hematol 2020; 153: 103037. doi: 10.1016/j.critrevonc.2020.103037 [DOI] [PubMed] [Google Scholar]
- 17.Chen X, Muhammad KG, Madeeha C, et al. Calculated indices of volatile organic compounds (VOCs) in exhalation for lung cancer screening and early detection. Lung Cancer 2021; 154: 197–205. doi: 10.1016/j.lungcan.2021.02.006 [DOI] [PubMed] [Google Scholar]
- 18.Phillips M, Bauer TL, Pass HI. A volatile biomarker in breath predicts lung cancer and pulmonary nodules. J Breath Res 2019; 13: 036013. doi: 10.1088/1752-7163/ab21aa [DOI] [PubMed] [Google Scholar]
- 19.Peled N, Hakim M, Bunn PA, et al. Non-invasive breath analysis of pulmonary nodules. J Thorac Oncol 2012; 7: 1528–1533. doi: 10.1097/JTO.0b013e3182637d5f [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Wang H, Wu Y, Sun M, et al. Enhancing diagnosis of benign lesions and lung cancer through ensemble text and breath analysis: a retrospective cohort study. Sci Rep 2024; 14: 8731. doi: 10.1038/s41598-024-59474-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Lu G, Su Z, Yu X, et al. Differentiating pulmonary nodule malignancy using exhaled volatile organic compounds: a prospective observational study. Cancer Med 2025; 14: e70545. doi: 10.1002/cam4.70545 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Buma AIG, Muntinghe-Wagenaar MB, van der Noort V, et al. Lung cancer detection by electronic nose analysis of exhaled breath: a multicentre prospective external validation study. Ann Oncol 2025; 36: 786–795. doi: 10.1016/j.annonc.2025.03.013 [DOI] [PubMed] [Google Scholar]
- 23.Fu XA, Li M, Knipp RJ, et al. Noninvasive detection of lung cancer using exhaled breath. Cancer Med 2014; 3: 174–181. doi: 10.1002/cam4.162 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Janssens E, Schillebeeckx E, Zwijsen K, et al. External validation of a breath-based prediction model for malignant pleural mesothelioma. Cancers 2022; 14: 3182. doi: 10.3390/cancers14133182 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Zwijsen K, Schillebeeckx E, Janssens E, et al. Determining the clinical utility of a breath test for screening an asbestos-exposed population for pleural mesothelioma: baseline results. J Breath Res 2023; 17: 046005. doi: 10.1088/1752-7163/acf7e3 [DOI] [PubMed] [Google Scholar]
- 26.Horvath I, Barnes PJ, Loukides S, et al. A European Respiratory Society technical standard: exhaled biomarkers in lung disease. Eur Respir J 2017; 49: 1600965. doi: 10.1183/13993003.00965-2016 [DOI] [PubMed] [Google Scholar]
- 27.Lamote K, Vynck M, Thas O, et al. Exhaled breath to screen for malignant pleural mesothelioma: a validation study. Eur Respir J 2017; 50: 1700919. doi: 10.1183/13993003.00919-2017 [DOI] [PubMed] [Google Scholar]
- 28.Baumbach JI. Process analysis using ion mobility spectrometry. Anal Bioanal Chem 2006; 384: 1059–1070. doi: 10.1007/s00216-005-3397-8 [DOI] [PubMed] [Google Scholar]
- 29.Youden WJ. Index for rating diagnostic tests. Cancer 1950; 3: 32–35. doi: [DOI] [PubMed] [Google Scholar]
- 30.van der Helm HJ, Hische EA. Application of Bayes's theorem to results of quantitative clinical chemical determinations. Clin Chem 1979; 25: 985–988. doi: 10.1093/clinchem/25.6.985 [DOI] [PubMed] [Google Scholar]
- 31.Baumbach JI. Ion mobility spectrometry coupled with multi-capillary columns for metabolic profiling of human breath. J Breath Res 2009; 3: 034001. doi: 10.1088/1752-7155/3/3/034001 [DOI] [PubMed] [Google Scholar]
- 32.Bunkowski A, Bödeker B, Bader S, et al. MCC/IMS signals in human breath related to sarcoidosis-results of a feasibility study using an automated peak finding procedure. J Breath Res 2009; 3: 046001. doi: 10.1088/1752-7155/3/4/046001 [DOI] [PubMed] [Google Scholar]
- 33.Belgian Cancer Registry . Cancer Fact Sheet: Longkanker 2022. Brussels, Belgium, Belgian Cancer Registry, 2022. [Google Scholar]
- 34.Ciompi F, Chung K, van Riel SJ, et al. Towards automatic pulmonary nodule management in lung cancer screening with deep learning. Sci Rep 2017; 7: 46479. doi: 10.1038/srep46479 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Christe A, Leidolt L, Huber A, et al. Lung cancer screening with CT: evaluation of radiologists and different computer assisted detection software (CAD) as first and second readers for lung nodule detection at different dose levels. Eur J Radiol 2013; 82: e873–e878. doi: 10.1016/j.ejrad.2013.08.026 [DOI] [PubMed] [Google Scholar]
- 36.Bolte H, Jahnke T, Schäfer FK, et al. Interobserver-variability of lung nodule volumetry considering different segmentation algorithms and observer training levels. Eur J Radiol 2007; 64: 285–295. doi: 10.1016/j.ejrad.2007.02.031 [DOI] [PubMed] [Google Scholar]
- 37.Dragonieri S, Quaranta VN, Carratu P, et al. Influence of age and gender on the profile of exhaled volatile organic compounds analyzed by an electronic nose. J Bras Pneumol 2016; 42: 143–145. doi: 10.1590/S1806-37562015000000195 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Hewitt MJ, Belluomo I, Zuffa S, et al. Variation of volatile organic compound levels within ambient room air and its impact upon the standardisation of breath sampling. Sci Rep 2022; 12: 15887. doi: 10.1038/s41598-022-20365-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Gordon SM, Szidon JP, Krotoszynski BK, et al. Volatile organic compounds in exhaled air from patients with lung cancer. Clin Chem 1985; 31: 1278–1282. [PubMed] [Google Scholar]
- 40.Wang P, Huang Q, Meng S, et al. Identification of lung cancer breath biomarkers based on perioperative breathomics testing: a prospective observational study. eClinicalMedicine 2022; 47: 101384. doi: 10.1016/j.eclinm.2022.101384 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Ulanowska A, Kowalkowski T, Trawińska E, et al. The application of statistical methods using VOCs to identify patients with lung cancer. J Breath Res 2011; 5: 046008. doi: 10.1088/1752-7155/5/4/046008 [DOI] [PubMed] [Google Scholar]
- 42.Spanel P, Dryahina K, Smith D. A quantitative study of the influence of inhaled compounds on their concentrations in exhaled breath. J Breath Res 2013; 7: 017106. doi: 10.1088/1752-7155/7/1/017106 [DOI] [PubMed] [Google Scholar]
- 43.Slefarska-Wolak D, Heinzle C, Leiherer A, et al. Volatilomic signatures of AGS and SNU-1 gastric cancer cell lines. Molecules 2022; 27: 4012. doi: 10.3390/molecules27134012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Jia Z, Patra A, Kutty VK, et al. Critical review of volatile organic compound analysis in breath and in vitro cell culture for detection of lung cancer. Metabolites 2019; 9: 52. doi: 10.3390/metabo9030052 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Supplementary material
00827-2025.SUPPLEMENT



