Abstract
Background
Lung cancer has significantly higher incidence and mortality rates worldwide. In this study, we analyzed the metabolic profiles of non‐small cell lung cancer (NSCLC) patients and constructed prediction models for smokers and nonsmokers with internal validation.
Methods
Plasma was collected from all patients enrolled for metabolic profiling by liquid chromatography–tandem mass spectrometry (LC–MS/MS). The total population was divided into two groups according to smoking or not. Statistical analysis of metabolites was performed separately for each group and prediction models were constructed.
Results
A total of 1723 patients (1109 NSCLC patients and 614 healthy controls) were enrolled from the affiliated hospital during 2018 to 2021. After grouping by smoking history, each group was statistically analyzed and prediction models were constructed, which resulted in eight indicators (propionylcarnitine, arginine, citrulline, etc.) significantly associated with lung cancer risk for smokers and eight indicators (dodecanoylcarnitine, hydroxybutyrylcarnitine, asparagine, etc.) for nonsmokers (p < 0.05). The smoker model indicated an AUC of 0.860 in the training set and 0.850 in the validation set. The nonsmoker model showed an AUC of 0.783 in the training set and 0.762 in the validation set. Further calibration tests for both models indicated excellent goodness‐of‐fit results.
Conclusions
In this study, we found a series of metabolites significantly associated with lung cancer incidence and constructed respectively prediction models for NSCLC risk in smokers and nonsmokers, with internal validation to confirm the efficiency to discriminate lung cancer risk in both smoking and nonsmoking states.
Keywords: metabolite, nomogram, non‐small cell lung cancer, prediction model
A total of 1723 patients (1109 NSCLC patients and 614 healthy controls) were recruited from the affiliated hospital. After grouping by smoking history, each group was statistically analyzed to construct prediction models composed by serum metabolites. The smoker model showed an AUC of 0.860 in the training set and 0.850 in the validation set. The nonsmoker model showed an AUC of 0.783 in the training set and 0.762 in the validation set. Calibration tests on both models showed good fit results.

INTRODUCTION
Lung cancer is one of the most common malignancies worldwide and the leading cause of cancer‐related death, with more than 1.7 million cases of deaths globally in 2020. 1 In China, the incidence and mortality rates of lung cancer are consistently the highest among all cancers. 2 As there are often no specific clinical signs in the early stages of the disease process, most lung cancer patients are diagnosed in the middle to late stages, which has resulted in the 5‐year survival rate of lung cancer patients in China still being less than 20%. 3 Therefore, research into effective early screening strategies is an important part of improving the detection rate of lung cancer.
Numerous clinical studies and trials have demonstrated that the application of LDCT can effectively reduce lung cancer mortality and has been widely adopted around the world. 4 In North America, lung cancer screening guidelines published by the National Comprehensive Cancer Network (NCCN) recommend regular LDCT screening for people at high risk of lung cancer (age ≥ 50 years and smoking history ≥20 packs/year). 5 However, LDCT still has problems such as false‐positive results and radiation hazards, 6 while in some regions with relatively lower economic levels there is resistance to implementing LDCT screening on a large scale. Current research on biomolecules provides more possibilities for enriching the means of early lung cancer screening and guiding precision medicine. With the advantage of easy accessibility of samples, biomarkers such as neuron specific enolase (NSE) and squamous cell carcinoma antigen (SCCA) have been developed clinically as complementary tools for lung cancer diagnosis, but their sensitivity and specificity are relatively limited. In addition to the traditional biomarkers mentioned above, a huge number of studies in recent years have focused on the metabolomics to perform high‐throughput analysis of human small molecule metabolites and acquire key information on the tumor microenvironment with metabolic reprogramming, which can be translated into clinical diagnostic and therapeutic approaches. Some of the metabolites have therefore been used as biomarkers in the construction of diagnostic models for lung cancer. 7 A representative study is the AICS model from Japan. This study used the concentrations of 19 free amino acids in plasma, including tryptophan, glutamate and alanine, to build a prediction model. The model was used to classify the risk of cancer in participants by the predicted probability, thus investigating the incidence of different cancers. In a multicenter study, the AICS model was clinically validated for predicting the incidence of various malignancies, which is useful for long‐term monitoring of high‐risk groups. 8 In addition, some studies have also used machine learning methods and statistical tools such as nomogram to visualize the prediction model, which enriches the application scenarios. 9 , 10 However, there are no prediction models that stratify the population by smoking history and precisely distinguish the metabolic differences between smoking and nonsmoking populations. For the incidence of lung cancer is significantly higher among nonsmokers, especially women, in East Asia compared to European and American countries, 11 predicting the risk of lung cancer based on clinical characteristics and metabolic markers considering the smokers and nonsmokers will benefit to accurate early screening.
In this study, we aimed to investigate the metabolic differences between smokers and nonsmokers in the lung cancer population using blood sample analysis and stratify them. These differing characteristics may be key to understanding the potential risk in patients with lung cancer. In addition, separate clinical prediction models for lung cancer in different smoking status groups will be constructed and internally validated to provide a more accurate and objective detection tool for early screening.
METHODS
Study population
This was a retrospective study, and the flow chart is shown in Figure 1. A total of 1723 patients, including 1109 lung cancer patients (1046 adenocarcinomas, 60 squamous carcinomas, and 3 patients with untyped NSCLC) and 614 healthy controls, were enrolled in the study. The population was divided into a smoking group (251 lung cancer patients and 147 healthy controls) and a nonsmoking group (858 lung cancer patients and 467 healthy controls) according to the patient's self‐reported history of smoking. In the first stage, differences in metabolic markers between lung cancer patients and healthy controls were assessed separately in each group. Next, the correlation between metabolites and lung cancer risk in each group was analyzed individually, and clinical prediction models were constructed for each group. All patients were recruited from the Second Hospital of Dalian Medical University during 2018 to 2021.
FIGURE 1.

Overview of the flow chart for study design and data analysis.
All enrolled individuals were ≥ 18 years old. The inclusion criteria for lung cancer patients were those with primary lung malignancy and pathologically confirmed NSCLC according to the 2021 edition of the WHO standards for lung cancer classification. At the time of diagnosis, patients had not yet received any anti‐tumor therapy such as radiotherapy, chemotherapy, or immunotherapy, and had complete medical records and no history of other malignancies. The inclusion criteria for the healthy controls were healthy physical examination population in the same time period, with complete medical records and no history of malignant tumors. And the enrolled population had excluded autoimmune diseases, serious heart, liver and kidney diseases, metabolic syndrome and other diseases that may cause metabolic disorders. The study was approved by the Ethics Committee of the Second Hospital of Dalian Medical University.
Plasma sample collection
All participants followed the standard blood collection procedure and, after fasting for more than 8 h, 5 mL of blood was collected from peripheral veins in collection tubes containing ethylenediaminetetraacetic acid (EDTA), using the dried blood spot (DBS) method to form homogeneous blood plaques for metabolic profiling. All samples were processed within 6 h of sampling.
Targeted metabolic profile analysis
All samples were assayed by specialist physicians in the hospital laboratory department and the targeted metabolic profile analysis was performed by LC–MS/MS. After sample pretreatment in strict accordance with the standard procedure, the samples were analyzed by an API 3200 chromatography‐tandem mass spectrometer with an electrospray ionization source and the operating software Chemo View 1.4.2, and an Agilent 1200 high‐performance liquid chromatograph (Agilent Technologies, USA).
Statistical analysis
SPSS 24.0 and R 4.0.3 platforms were used for statistical analysis. Continuous variables with normal distribution were expressed as mean ± standard deviation (x¯±s), with independent samples t‐test for comparison between groups. Continuous variables with skewed distribution were expressed using median and quartiles (Q1, Q3), and the Mann–Whitney U test was used for comparison between groups. Categorical variables were expressed using the number of cases and percentages, with the chi‐squared test used for comparisons between groups. In the training set, the last absolute shrinkage and selection operator (Lasso) algorithm was used to normalize and select candidate variables via the “glmnet” package in the R project, and multivariate logistic regression analysis was used for correlation analysis and construction of regression equations. The prediction models and nomograms were built using the “rms” package under the R project. The models were further tested in both the training and validation sets. Receiver operator characteristic curve (ROC) and area under the curve (AUC) were used to check model discrimination, calibration curves were plotted and model fit was assessed by Hosmer‐Lemeshow (H‐L) test. Decision curve analysis (DCA) was used to assess the net clinical benefit of the model.
RESULTS
Characteristics of the study population
All data were preanalyzed before grouping to establish the rationale for stratification by smoking history, the results of the comparative analysis are shown in Figure 2 and Table S1. Afterwards, the entire study population was divided into smoking and nonsmoking groups according to their smoking history, and independent prediction models (for smokers and nonsmokers) were constructed. The datasets for both the smoking and nonsmoking groups were randomly divided 7:3 into the training and validation sets prior to analysis, and the population characteristics of each dataset are shown in Table 1. All variables were not statistically different between the two datasets (p > 0.05) and were comparable (see Table S2 for details).
FIGURE 2.

Differential analysis of metabolic profiles in different groups. Box and whisker plots of blood metabolite concentration comparisons between lung cancer populations and healthy controls (gray colored area) and between smokers and nonsmokers among lung cancer populations (white colored area). The above 19 metabolites were differentially expressed in the lung cancer populations (p < 0.05 compared to healthy controls) and were also significantly different between smoking and nonsmoking patients among the lung cancer populations (p < 0.05). C3, propionylcarnitine; C4, butyrylcarnitine; C4DC, succinylcarnitine; C4OH, hydroxybutyrylcarnitine, C5, isovalerylcarnitine; C5OH, hydroxyisovalerylcarnitine; C20, eicosylcarnitine; C26, hexacosanoylcarnitine; Arg, arginine; Asn, asparagine; Asp, aspartic acid; Cit, citrulline; Phe, phenylalanine; Val, valine; EA, eicosenoic acid; PUFAs, polyunsaturated fatty acids; FAs, fatty acids; ARA, arachidonic acid; Omega‐6, unsaturated fatty acids ω6.
TABLE 1.
Characteristics of the population in the training and validation sets.
| Training set | Validation set | Z/X2 | p‐values | |
|---|---|---|---|---|
| Smoking group | N = 279 | N = 119 | ||
| Age (years) | 62 (55, 69) | 64 (57, 69) | −0.618 | 0.537 |
| Lung cancer patients | 64 (58, 69.75) | |||
| Healthy controls | 58 (53, 68) | |||
| Z/X2 | −3.681 | |||
| p‐values | <0.001 | |||
| Male (n, %) | 267 (95.7) | 110 (92.4) | 1.776 | 0.183 |
| Lung cancer patients | 165 (95.9) | |||
| Healthy controls | 102 (95.3) | |||
| Z/X2 | 0.058 | |||
| p‐values | 0.809 |
| Nonsmoking group | N = 928 | N = 397 | ||
|---|---|---|---|---|
| Age (years) | 62 (53.68) | 62 (54.68) | −0.714 | 0.475 |
| Lung cancer patients | 62 (53.69) | |||
| Healthy controls | 62 (52.68) | |||
| Z/X2 | −1.478 | |||
| p‐values | 0.140 | |||
| Male (n, %) | 249 (26.8) | 96 (24.2) | 1.014 | 0.314 |
| Lung cancer patients | 120 (20.2) | |||
| Healthy controls | 129 (38.6) | |||
| Z/X2 | 36.951 | |||
| p‐values | <0.001 |
Differences in metabolic profiles between lung cancer populations and healthy controls
Univariate analysis of the variance was performed for metabolite variables between the lung cancer populations and the healthy controls separately in the smoking and nonsmoking groups, and variables with statistically significant differences (p < 0.05) were retained. In the smoking group, 30 of 65 metabolites were statistically different, including 13 carnitines (e.g., propionylcarnitine, isopentenoylcarnitine, hexacosanoylcarnitine), six amino acids (alanine, arginine, asparagine, citrulline, serine, and valine), and 11 fatty acids (e.g., palmitoleic acid, oleic acid, arachidic acid). In the nonsmoking group, a total of 39 metabolites were statistically different, including 19 carnitines (e.g., hydroxybutyrylcarnitine, dodecanoylcarnitine, hexacosanoylcarnitine), eight amino acids (asparagine, glutamic acid, methionine, ornithine, phenylalanine, serine, tyrosine, and valine), and 12 fatty acids (e.g., oleic acid, eicosatrienoic acid, myristic acid). All statistical analyses were performed within the training set and the results are detailed in Table S3. The above metabolites were further multivariate analyzed.
Generation of metabolic predictors for lung cancer risk
Lasso regression was used for further screening of the above candidate variables in the multivariate analysis session. lt was used to solve the problem of multicollinearity among metabolites by means of introducing penalty parameter, while avoiding model overfitting and accomplishing the selection of variables. The differential metabolites retained in the smoking and nonsmoking groups above were separately evaluated by the lasso regression, which was calculated using the “glmnet” package in R project. The procedure is shown in Figure 3, with the increasing penalty parameter λ (lambda), the regression coefficients of each covariate converge to 0 in different speeds. A 10‐fold cross‐validation is used for fitting, and the optimal λ was finally determined. As shown in Figure 4, the two dotted lines orthogonal to the x‐axis represent the two optimal λ parameters. “Lambda.min” is the penalty parameter at the minimum mean cross‐validated error (cvm), and “lambda.1se” is the maximum penalty parameter taken for cvm within 1 standard error. 12 To reduce errors and ensure model accuracy, “lambda.min” was selected as the optimal parameter (lambda.min = 0.01193467 for the smoking group and lambda.min = 0.004889447 for the nonsmoking group). The final retained predictors were as follows, 19 metabolites were retained in the smoking group (propionylcarnitine, arginine, palmitoleic acid, etc.), 31 metabolites were retained in the nonsmoking group (hydroxybutyrylcarnitine, asparagine, oleic acid, etc.). Regression coefficients of the above retained variables in detail are shown in Table S4. The above metabolic predictors will be further evaluated in multivariate correlation analysis. All statistical analyses were performed in the training sets.
FIGURE 3.

Lasso coefficient profiles of metabolites based on the log (λ) sequence. (a) Nineteen metabolites were retained in the smoking group. (b) A total of 31 metabolites were retained in the nonsmoking group,
FIGURE 4.

Penalty parameter (λ) selection in Lasso model used 10‐fold cross‐validation via minimum criteria. (a) λ = 0.01193467 for the smoking group. (b) λ = 0.004889447 for the nonsmoking group.
Multivariate correlation analysis and the construction of models
A stepwise logistic regression was used to identify possible predictors of the outcome (lung cancer), which included age, gender and metabolic indicators screened above. At each step, variables were selected based on p‐values, and specific p‐value thresholds were used to restrict the variables included in the final model (p < 0.05 as an entry criterion, p > 0.1 as an elimination criterion). The final results are shown in Table 2, in which eight variables were found to be significantly associated with the risk of lung cancer in the smoking group (propionylcarnitine [C3], isopentenoylcarnitine [C5:1], hexacosanoylcarnitine [C26], arginine [Arg], asparagine [Asn], citrulline [Cit], palmitoleic acid [POA], and monounsaturated fatty acids [MUFAs]). Eight variables were found to be significantly associated with the risk of lung cancer in the nonsmoking group (gender, hydroxybutyrylcarnitine [C4OH], dodecanoylcarnitine [C12], hexacosanoylcarnitine [C26], asparagine [Asn], oleic acid [OA], eicosatrienoic acid [EtrA], and saturated fatty acids [SFAs]). The relationship between lung cancer risk, covariates and regression coefficients was transformed into a nomogram using the “rms” package in R project, and each nomogram of the prediction model is shown in Figure 5.
TABLE 2.
Multivariate regression model based on logistic regression analysis.
| Variables | Coefficients | p‐value | OR | 95% CI |
|---|---|---|---|---|
| Model for smokers | ||||
| C3 | −0.397 | 0.045 | 0.672 | 0.456–0.992 |
| C5:1 | −0.019 | 0.006 | 0.981 | 0.968–0.995 |
| C26 | −0.017 | 0.087 | 0.983 | 0.964–1.002 |
| Arg | 0.115 | 0.001 | 1.121 | 1.045–1.203 |
| Asn | −0.018 | 0.007 | 0.982 | 0.969–0.995 |
| Cit | 0.067 | 0.001 | 1.069 | 1.027–1.114 |
| POA | 0.019 | <0.001 | 1.019 | 1.012–1.026 |
| MUFAs | −1.539 | <0.001 | 0.215 | 0.126–0.365 |
| Intercept | 2.623 | 0.001 | 13.778 | |
| Model for nonsmokers | ||||
| Gender | −0.513 | 0.004 | 0.598 | 0.42–0.852 |
| C4OH | −0.007 | <0.001 | 0.993 | 0.99–0.997 |
| C12 | 0.017 | <0.001 | 1.017 | 1.01–1.024 |
| C26 | −0.011 | 0.009 | 0.989 | 0.98–0.997 |
| Asn | −0.024 | <0.001 | 0.976 | 0.968–0.983 |
| OA | −0.480 | <0.001 | 0.619 | 0.487–0.786 |
| EtrA | 0.010 | <0.001 | 1.010 | 1.006–1.014 |
| SFAs | −0.128 | 0.008 | 0.880 | 0.801–0.968 |
| Intercept | 2.866 | <0.001 | 17.560 |
Abbreviations: Arg, arginine; Asn, asparagine; C3, propionylcarnitine; C5:1, isopentenoylcarnitine; C12, dodecanoylcarnitine; C26, hexacosanoylcarnitine; C4OH, hydroxybutyrylcarnitine; Cit, citrulline; EtrA, eicosatrienoic acid; MUFAs, monounsaturated fatty acids; OA, oleic acid; POA, palmitoleic acid; SFAs, saturated fatty acids.
FIGURE 5.

Nomogram of the lung cancer risk prediction models. (a) Model for smokers. (b) Model for nonsmokers. *p < 0.05, **p < 0.01 and ***p < 0.001.
Validation of prediction models
The discrimination ability of the lung cancer risk models was tested using ROC curves and judged by AUC, which is generally considered to be 0.5 < AUC <0.7 for average discrimination, 0.7 ≤ AUC <0.8 for acceptable discrimination, 0.8 ≤ AUC <0.9 for excellent discrimination, and AUC ≥ 0.9 for outstanding discrimination ability. 13 , 14 The data from the training and validation sets of the two groups were entered into the models for testing, and the results were as follows. In the smoker model, the AUC of the training set (Figure 6a) was 0.860 (95% CI: 0.814–0.906, p < 0.05), and the AUC of the validation set (Figure 6b) was 0.850 (95% CI: 0.774–0.926, p < 0.05). In the nonsmoker model, the AUC of the training set (Figure 6c) was 0.783 (95% CI: 0.753–0.813, p < 0.05), the validation set (Figure 6d) had an AUC of 0.762 (95% CI:0.710–0.813, p < 0.05).
FIGURE 6.

Receiver operating characteristic (ROC) curve of the prediction models. (a) ROC of the training set in the smoker model. (b) ROC of the validation set in smoker model. (c) ROC of the training set in the nonsmoker model. (d) ROC of the validation set in nonsmoker model.
The goodness of fit was examined using the H‐L test with the calibration curve plotted by the “rms” package in the R project. The test assumes that there is no statistically significant difference between model predictions and outcome observations, and if the p‐value > 0.05, it suggests that the null hypothesis holds and the model is of good fit. 14 , 15 The training set and validation set data from both groups were entered into the model separately for testing, and the results are as follows. In the smoker model, the p‐value of the H‐L test for the training set was 0.082 and 0.096 for the validation set. In the nonsmoker model, the p‐value of the H‐L test was 0.699 for the training set and 0.512 for the validation set, indicating that both models passed the goodness‐of‐fit test. The calibration curves were further plotted to visualize the results, as shown in Figure 7.
FIGURE 7.

Calibration curves of the prediction models. The dotted curve represents the original performance, and the solid curve represents the performance after bootstrapping (B = 1000 repetitions). (a) Calibration curves of the training set in the smoker model. (b) Calibration curves of the validation set in the smoker model. (c) Calibration curves of the training set in the nonsmoker model. (d) Calibration curves of the validation set in nonsmoker model.
The utility of the prediction models was assessed using clinical decision curve analysis. As shown in Figure 8, by setting different risk thresholds for the model (x‐axis), the net benefit ratio (y‐axis) was compared with the “intervention for all” group (curve corresponding to “All” in the legend) and “intervention for none” group (line corresponding to “None” in the legend). 16 Figure 8a shows that in the smoker model, when risk thresholds were defined between 0.07 and 0.92, the net benefit rate of the model was higher than “intervention for all” and “intervention for none”. Figure 8b indicates that in the nonsmoker model, the net benefit rate of the model was higher than the other two groups when the defined risk thresholds were located between 0.01and 0.94. In conclusion, the prediction models of this study all showed good utility and clinical benefit.
FIGURE 8.

The clinical decision curve analysis of net benefit for the prediction models. (a) Decision curve analysis (DCA) of the smoker model. (b) DCA of the nonsmoker model.
DISCUSSION
Lung cancer remains the deadliest malignancy worldwide, and due to the lack of specific symptoms in the early stage, most patients are diagnosed at an advanced stage and have a worse prognosis. Although early screening for lung cancer has been widely implemented around the world, the high‐risk groups included are generally long‐term smokers only. As the proportion of nonsmokers with lung cancer have continued to increase in recent years, there is a need to expand the range of screening to improve the detection rate. So far, some studies have constructed lung cancer risk prediction models using risk indicators that have been clinically confirmed, such as personal history and exposure history, which have been combined with LDCT as a complementary tool for lung cancer screening. Some of them have been applied in clinical practice such as PLCOM2012 from North America, which constructed a lung cancer prediction model using a cohort involving screening trials for prostate, lung, colorectal and ovarian cancers by collecting clinical information such as smoking history of relevant people. 17 Compared to the screening criteria of The National Lung Screening Trial (NLST), LDCT screening with risk‐positive individuals predicted by PLCOM2012 as a high‐risk group had a diagnostic sensitivity (83.0% for PLOCM2012 and 71.1% for NLST, p < 0.001) and positive predictive value (4.0% for PLOCM2012 and 3.4% for NLST, p = 0.01) significantly improved with no loss of specificity (62.9% for PLOCM2012 and 62.7% for NLST, p = 0.54). 18 The model has been validated by different research teams in several countries, including the United States, Germany, and the United Kingdom, but further multicenter prospective studies are needed to compare its validity advantages and disadvantages among different screening criteria. 19 In another study based on a case–control cohort in Liverpool, UK, the investigators planned and constructed a predictive model called the Liverpool Lung Project (LLP). The model was calibrated with data from the UK Lung Screening Trial, with continuous adjustment of the model parameters. In the final version 3 of the prediction model, LLPv3, the researchers validated it using 5‐year follow‐up data including 75 958 people and performed analyses based on ROC curves and H‐L tests. The results showed that the model had good discrimination, with an AUC of 0.81 (95% CI: 0.79–0.82). However, the model still had a more significant false positive rate, with actual cancer outcomes being less than 50% of the predicted counts. 20 Due to the significant demographic heterogeneity in education, household income, and lifestyle among different species, the above study still lacks sufficient data support for applying to East Asian populations. The study also did not stratify the population by smoking history, which cannot be adapted to the current epidemiological trends of lung cancer.
Therefore, this study aimed to construct lung cancer prediction models using serum small molecule metabolites; meanwhile, we stratified the patients based on their smoking history and constructed objective prediction models for smokers and nonsmokers separately. A nomogram, a graphical computational tool, was used to visualize the models by converting each predictor and its regression coefficient into a two‐dimensional function image. 21 A total of one demographic indicator (gender) and 13 metabolic indicators (propionylcarnitine, isopentenoylcarnitine, hexacosanoylcarnitine, dodecanoylcarnitine, hydroxybutyrylcarnitine, arginine, asparagine, citrulline, palmitoleic acid, monounsaturated fatty acids, oleic acid, eicosatrienoic acid, and saturated fatty acids) were eventually incorporated into the models, with the smoker model including eight indicators (propionylcarnitine, isopentenoylcarnitine, hexacosanoylcarnitine, arginine, asparagine, citrulline, palmitoleic acid, and monounsaturated fatty acids), and eight indicators (gender, hydroxybutyrylcarnitine, dodecanoylcarnitine, hexacosanoylcarnitine, asparagine, oleic acid, eicosatrienoic acid, and saturated fatty acids) were included in the model for nonsmokers. The models were validated to have good efficacy in predicting the risk of lung cancer in different smoking populations.
A total of five carnitine metabolites were included in the model, of which four acyl‐carnitines, propionylcarnitine, isopentenoylcarnitine, hexacosanoylcarnitine, and hydroxybutyrylcarnitine, all showed negative correlation with lung cancer. This is consistent with the conclusion of Smith‐Byrne et al. that elevated acyl‐carnitine is negatively associated with lung cancer risk as shown in their study on lung cancer risk prediction based on serum metabolites. 22 Circulating isovalerylcarnitine (IVC) in this study is a substrate involved in the degradation process of leucine. As an essential amino acid, leucine is involved in metabolic regulation through the mammalian target of rapamycin complex 1 (mTORC1) pathway, affecting proliferative signals within tumor cells. 23 Also circulating IVC acts as a selective activator of calpain and is involved in the induction of apoptosis. 24 , 25 Therefore, lower levels of acyl‐carnitine may interfere with programmed cell death. However, the correlation between acyl‐carnitine and lung cancer is more limited, and in other tumors, Lu et al. noted that acetylcarnitine levels were significantly lower in patients with hepatocellular carcinoma than in normal subjects and correlated with liver cancer prognosis. 26 A further study by Zhao et al. elucidated the relevant mechanism, where cysteine‐rich intestinal protein 1 downregulated acetylcarnitine levels by inhibiting carnitine synthesis, reduced β‐linked protein acetylation, and promoted activation of Wnt/β‐catenin signaling pathway, which in turn enhances hepatocellular carcinoma cell stemness. 27 In addition to this there is still a positive correlation between acyl‐carnitine (dodecanoylcarnitine) expression and lung cancer prediction in our study. Although a similar conclusion was reached in the study by Li et al., 28 the elaboration of the pathophysiological mechanisms regarding acyl‐carnitine in lung cancer is still limited and further studies are needed.
As a substrate for protein synthesis, amino acids are the second largest source of intracellular energy and nutrition after glucose. Amino acids enable tumors to maintain their proliferative drive and play a role in energy production, nucleotide synthesis, and maintenance of cellular redox homeostasis. A total of three amino acid metabolites, arginine, citrulline, and asparagine, were included in this study. Arginine is a basic semi‐essential amino acid in humans, a major component of cytoplasmic and nucleic acid proteins, and plays an important role in the regulation of cell proliferation and energy metabolism as precursor and intermediate products involved in oxidative phosphorylation and urea cycle. The study by Ni et al. on serum target metabolites in lung cancer patients came to the same conclusion as ours, that serum arginine level is upregulated in lung cancer patients and possesses the ability to become a biomarker. This is thought to be possibly related to arginine‐regulated nitric oxide synthesis (NOS), and the upregulation of nitric oxide further affects the relevant aspects of tumor proliferation, including oxidative stress, apoptosis, tumor invasion and metastasis. 29 The use of arginine deprivation mechanism, which involves irreversible hydrolysis of arginine by exogenous arginine deiminase (ADI), and consequently as a treatment for advanced cancer, has been shown to have promising clinical applications. 30 Studies on the role of citrulline in lung carcinogenesis are limited given its role as an intermediate of the urea cycle, an innate immune signaling metabolite involved in proinflammatory responses. 31 Our findings suggest that citrulline is positively associated with the risk of lung carcinogenesis, and similar results were confirmed in a study on diagnostic markers for colorectal cancer, which found that elevated levels of citrulline and altered levels of some other metabolites are risk factors for malignant colorectal occupancy, considered in relation to its involvement in tricarboxylic acid (TCA) cycle and colorectal cancer cell signaling. 32 Furthermore, our study showed that asparagine levels were negatively correlated with lung cancer risk, although several studies suggest that elevated asparagine levels promote tumor proliferation by a mechanism formulated as the capacity of asparagine to modulate the activity of activating transcription factor 4 (ATF4) and mTORC1, maintaining tumor cell growth even when the mitochondrial electron transport chain (ETC) is inhibited. 33 However, there are still studies confirming the possibility of asparagine as a protective factor for carcinogenesis, and the study by Jiang et al., using in vitro experiments combined with animal models, indicated that asparagine upregulation enhances cytotoxic T cell activity, thus directly regulating the immune response at the level of the tumor microenvironment. 34
Abnormal lipid metabolism is one of the main metabolic features of tumors, and fatty acids play an important role in energy metabolism, signaling and cell membrane construction in tumor cells, and are involved in various oncogenic signaling pathways in different cancers. 35 The upregulation of fatty acid synthase expression is influenced by the PI3K/Akt pathway. 36 In contrast, the STK11/LKB1 pathway further inhibits cancer development by reducing fatty acid synthesis via regulating AMP‐activated protein kinase (AMPK) activity. 37 A total of five fatty acid metabolic indicators, palmitoleic acid, eicosatrienoic acid, oleic acid, total monounsaturated fatty acids, and saturated fatty acids, were included in this study. Among them, palmitoleic acid and eicosatrienoic acid showed a positive association with lung cancer risk, which is consistent with the results of a study on the association of pleural fluid metabolites with lung cancer. This study has similarly confirmed that some long‐chain fatty acids are risk factors for lung cancer development and it included three long‐chain unsaturated fatty acids, palmitoleic acid, oleic acid and cis‐8‐eicosatrienoic acid, all of which showed a positive correlation with malignant pleural fluid. 38 Current studies suggest that the proliferation of tumor cells requires an increase in the de novo synthesis of fatty acids, a process that reduces the relative amount of polyunsaturated fatty acids and upregulates the levels of saturated as well as monounsaturated fatty acids in cell membrane lipids. 35 However, in our study, three fatty acid indicators, including total monounsaturated fatty acids, saturated fatty acids, and oleic acid, were negatively associated with lung cancer risk.
This study constructed lung cancer risk prediction models based on serum small molecule metabolites and visualized the models using a nomogram, a statistical tool, which has the following significant advantages over previous similar studies. First, the study stratified the lung cancer population with different smoking status and constructed independent prediction models for smoking and nonsmoking populations, which can provide guidance for expanding the coverage of lung cancer screening. Second, a total of 1723 samples were included in the study, including 1109 cases of lung cancer, and the larger sample size better controlled data bias and error compared with previous studies. Meanwhile, with the help of the R project and related machine learning algorithms, more scientific tests were conducted in this study, and objective and convenient lung cancer prediction models were finally established.
However, there are still some limitations of the current study results, and these issues need to be discussed further in future studies. First, the study included populations from a single center in China, based on Asian people, and the conclusions may not be representative of the incidence in other regions and races, which still need to be supported by multicenter data to improve external validation for checking the applicability of the model. Meanwhile, considering that the samples enrolled in this study were all laboratory data, their fatty acid index levels were influenced by the nutritional status of the patients, which may interfere with the final results, and data related to nutritional status such as body mass index were not discussed, which may affect the results of correlation analysis among some metabolic indicators. Finally, more clinical characteristics need to be incorporated further in subsequent studies to improve the accuracy of the models.
In conclusion, our study identified the characteristics of metabolic profile in patients with NSCLC in the smoking and nonsmoking populations using a large sample size from a single center, and constructed prediction models incorporating a nomogram. In the smoking group, propionylcarnitine, isopentenoylcarnitine, hexacosanoylcarnitine, arginine, asparagine, citrulline, palmitoleic acid, and monounsaturated fatty acids were significantly associated with lung cancer incidence. Among nonsmokers, gender, hydroxybutyrylcarnitine, dodecanoylcarnitine, hexacosanoylcarnitine, asparagine, oleic acid, eicosatrienoic acid, and saturated fatty acids were significantly associated with lung cancer incidence. The validity of the models was proved to be effective and had the ability to predict the risk of lung cancer in different smoking groups.
AUTHOR CONTRIBUTIONS
Xu Zhang and Hui Zhao contributed the central idea, analysed most of the data, and wrote the initial draft of the paper. The remaining authors contributed to refining the ideas, carrying out additional analyses and finalizing this paper.
CONFLICT OF INTEREST STATEMENT
All authors of this manuscript claim no conflicts of interest.
Supporting information
Table S1. Comparison of metabolic profiles between lung cancer populations and healthy controls.
Table S2. Comparison of variables between the training set and validation set.
Table S3. Univariate analysis of the variance in metabolic profiles between lung cancer populations and healthy controls.
Table S4. Coefficients of covariates in the lasso regression.
Zhang X, Wang C, Li C, Zhao H. Development and internal validation of nomograms based on plasma metabolites to predict non‐small cell lung cancer risk in smoking and nonsmoking populations. Thorac Cancer. 2023;14(18):1719–1731. 10.1111/1759-7714.14917
REFERENCES
- 1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA‐Cancer J Clin. 2021;71(3):209–49. [DOI] [PubMed] [Google Scholar]
- 2. Zheng R, Zhang S, Zeng H, Wang S, Sun K, Chen R, et al. Cancer incidence and mortality in China, 2016. J Natl Cancer Center. 2022;2(1):1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Zeng H, Chen W, Zheng R, Zhang S, Ji JS, Zou X, et al. Changing cancer survival in China during 2003‐15: a pooled analysis of 17 population‐based cancer registries. Lancet Glob Health. 2018;6(5):e555–67. [DOI] [PubMed] [Google Scholar]
- 4. National Lung Screening Trial Research Team , Aberle DR, Adams AM, et al. Reduced lung‐cancer mortality with low‐dose computed tomographic screening. N Engl J Med. 2011;365(5):395–409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Wood DE, Kazerooni EA, Aberle D, Berman A, Brown LM, Eapen GA, et al. NCCN guidelines® insights: lung cancer screening, version 1.2022. J Natl Compr Canc Netw. 2022;20(7):754–64. [DOI] [PubMed] [Google Scholar]
- 6. Rampinelli C, De Marco P, Origgi D, et al. Exposure to low dose computed tomography for lung cancer screening and risk of cancer: secondary analysis of trial data and risk‐benefit analysis. BMJ. 2017;356:j347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Schmidt DR, Patel R, Kirsch DG, Lewis CA, Vander Heiden MG, Locasale JW. Metabolomics in cancer research and emerging applications in clinical oncology. CA‐Cancer J Clin. 2021;71(4):333–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Mikami H, Kimura O, Yamamoto H, Kikuchi S, Nakamura Y, Ando T, et al. A multicentre clinical validation of aminoIndex cancer screening (AICS). Sci Rep. 2019;9(1):13831. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Ren YP, Tang AG, Zhou QX, Xiang ZY. Clinical significance of simultaneous determination of serum tryptophan and tyrosine in patients with lung cancer. J Clin Lab Anal. 2011;25(4):246–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Yang C, Huang S, Cao F, Zheng Y. A lipid metabolism‐related genes prognosis biomarker associated with the tumor immune microenvironment in colorectal carcinoma. BMC Cancer. 2021;21(1):1182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Ferlay J, Soerjomataram I, Dikshit R, Eser S, Mathers C, Rebelo M, et al. Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012. Int J Cancer. 2015;136(5):E359–86. [DOI] [PubMed] [Google Scholar]
- 12. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33(1):1–22. [PMC free article] [PubMed] [Google Scholar]
- 13. Fawcett T. An introduction to ROC analysis. Pattern Recognit Lett. 2006;27(8):861–74. [Google Scholar]
- 14. Hosmer DW, Lemeshow S. Assessing the fit of the model. Applied logistic regression. 2nd ed. New York: Wiley; 2005. p. 143–202. [Google Scholar]
- 15. Fenlon C, O'Grady L, Doherty ML, Dunnion J. A discussion of calibration techniques for evaluating binary and categorical predictive models. Prev Vet Med. 2018;149:107–14. [DOI] [PubMed] [Google Scholar]
- 16. Van Calster B, Wynants L, Verbeek JFM, et al. Reporting and interpreting decision curve analysis: a guide for investigators. Eur Urol. 2018;74(6):796–804. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Tammemagi CM, Pinsky PF, Caporaso NE, Kvale PA, Hocking WG, Church TR, et al. Lung cancer risk prediction: prostate, lung, colorectal and ovarian cancer screening trial models and validation. J Natl Cancer Inst. 2011;103(13):1058–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Tammemägi MC, Katki HA, Hocking WG, Church TR, Caporaso N, Kvale PA, et al. Selection criteria for lung‐cancer screening. N Engl J Med. 2013;368(8):728–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Tammemägi MC, Ruparel M, Tremblay A, Myers R, Mayo J, Yee J, et al. USPSTF2013 versus PLCOm2012 lung cancer screening eligibility criteria (international lung screening trial): interim analysis of a prospective cohort study. Lancet Oncol. 2022;23(1):138–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Field JK, Vulkan D, Davies MPA, Duffy SW, Gabe R. Liverpool lung project lung cancer risk stratification model: calibration and prospective validation. Thorax. 2021;76(2):161–8. [DOI] [PubMed] [Google Scholar]
- 21. Gondo OT, Riu Hamada M, Gondo T, Hamada R. Nomogram as predictive model in clinical practice. Cancer Chemother. 2009;36(6):901–6. [PubMed] [Google Scholar]
- 22. Smith‐Byrne K, Cerani A, Guida F, Zhou S, Agudo A, Aleksandrova K, et al. Circulating Isovalerylcarnitine and lung cancer risk: evidence from mendelian randomization and prediagnostic blood measurements. Cancer Epidemiol Biomarkers Prev. 2022;31(10):1966–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Jewell JL, Kim YC, Russell RC, et al. Differential regulation of mTORC1 by leucine and glutamine. Science. 2015;347(6218):194–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Pontremoli S, Melloni E, Viotti PL, Michetti M, Di Lisa F, Siliprandi N. Isovalerylcarnitine is a specific activator of the high calcium requiring calpain forms. Biochem Biophys Res Commun. 1990;167(1):373–80. [DOI] [PubMed] [Google Scholar]
- 25. Ferrara F, Bertelli A, Falchi M. Evaluation of carnitine, acetylcarnitine and isovalerylcarnitine on immune function and apoptosis. Drugs Exp Clin Res. 2005;31(3):109–14. [PubMed] [Google Scholar]
- 26. Lu Y, Li N, Gao L, Xu YJ, Huang C, Yu K, et al. Acetylcarnitine is a candidate diagnostic and prognostic biomarker of hepatocellular carcinoma. Cancer Res. 2016;76(10):2912–20. [DOI] [PubMed] [Google Scholar]
- 27. Wang J, Zhou Y, Zhang D, Zhao W, Lu Y, Liu C, et al. CRIP1 suppresses BBOX1‐mediated carnitine metabolism to promote stemness in hepatocellular carcinoma. EMBO J. 2022;41(15):e110218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Li T, He J, Mao X, Bi Y, Luo Z, Guo C, et al. In situ biomarker discovery and label‐free molecular histopathological diagnosis of lung cancer by ambient mass spectrometry imaging. Sci Rep. 2015;5:14089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Ni J, Xu L, Li W, Zheng C, Wu L. Targeted metabolomics for serum amino acids and acylcarnitines in patients with lung cancer. Exp Ther Med. 2019;18(1):188–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Feun LG, Kuo MT, Savaraj N. Arginine deprivation in cancer therapy. Curr Opin Clin Nutr Metab Care. 2015;18(1):78–82. [DOI] [PubMed] [Google Scholar]
- 31. Mao Y, Shi D, Li G, Jiang P. Citrulline depletion by ASS1 is required for proinflammatory macrophage activation and immune responses. Mol Cell. 2022;82(3):527–541.e7. [DOI] [PubMed] [Google Scholar]
- 32. Mei H, Niu H. Diagnostic value of serum amino acid, CEA and CA19‐9 in benign and malignant colorectal space‐occupying lesions. J Clin Res. 2020;37(8):1145–7. [Google Scholar]
- 33. Krall AS, Mullen PJ, Surjono F, Momcilovic M, Schmid EW, Halbrook CJ, et al. Asparagine couples mitochondrial respiration to ATF4 activity and tumor growth. Cell Metab. 2021;33(5):1013–1026.e6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Wu J, Li G, Li L, Li D, Dong Z, Jiang P. Asparagine enhances LCK signalling to potentiate CD8+ T‐cell activation and anti‐tumour responses. Nat Cell Biol. 2021;23(1):75–86. [DOI] [PubMed] [Google Scholar]
- 35. Snaebjornsson MT, Janaki‐Raman S, Schulze A. Greasing the wheels of the cancer machine: the role of lipid metabolism in cancer. Cell Metab. 2020;31(1):62–76. [DOI] [PubMed] [Google Scholar]
- 36. Berwick DC, Hers I, Heesom KJ, Moule SK, Tavare JM. The identification of ATP‐citrate lyase as a protein kinase B (Akt) substrate in primary adipocytes. J Biol Chem. 2002;277(37):33895–900. [DOI] [PubMed] [Google Scholar]
- 37. Shackelford DB, Shaw RJ. The LKB1‐AMPK pathway: metabolism and growth control in tumour suppression. Nat Rev Cancer. 2009;9(8):563–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Yan Q, Yang Z, Dai J. Free monounsaturated fatty acids in the differential diagnosis of benign and malignant pleural effusions. J Chin Pract Diagn Ther. 2018;8(4):751–4. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Table S1. Comparison of metabolic profiles between lung cancer populations and healthy controls.
Table S2. Comparison of variables between the training set and validation set.
Table S3. Univariate analysis of the variance in metabolic profiles between lung cancer populations and healthy controls.
Table S4. Coefficients of covariates in the lasso regression.
