Abstract
Background
The risk of lung cancer in nonsmokers is increasing; however, there are relatively few studies on the risks of lung cancer in nonsmokers.
Patients and Methods
We collected epidemiological and clinical data from 429 nonsmoking patients with lung nodules from the Affiliated Li Huili Hospital as a training cohort and 123 nonsmoking patients with lung nodules as a testing cohort. We identified variables that might be related to malignant lung nodules from 27 variables by performing least absolute shrinkage and selection operator analysis. Univariate and multivariate analyses of these variables were conducted using binary logistic regression. Significant variables were used to generate a lung cancer risk prediction model for nodules in nonsmokers.
Results
We successfully constructed a predictive nomogram incorporating density, ground‐glass opacities, pulmonary nodule size, hypertension, plasma fibrinogen levels, and blood urea nitrogen. This model exhibited good discriminative ability, with a C‐index value of 0.788 (95% confidence interval [CI]: 0.742–0.833) in the training cohort and 0.888 (95% CI: 0.835–0.941) in the testing cohort; it was well‐calibrated in both cohorts. Decision curve analyses supported the clinical value of this predictive nomogram when used at a lung cancer possibility threshold of 18%. Ten‐fold cross‐validation indicated good stability and accuracy of the model (kappa = 0.416 ± 0.128; accuracy = 0.751 ± 0.056; area under the curve = 0.768 ± 0.049).
Conclusion
Our risk model can reasonably predict the risks of lung cancer in nonsmoking Chinese patients with lung nodules.
Keywords: lung cancer, model, nonsmokers, pulmonary nodules
We deeply researched the effects on different lung nodules of individual variables at different levels through univariate, multivariate, binary, and multinomial logistic regression analyses . The model was constructed on the bias of those significant variables. This nomogram could help predict the risks of lung cancer in nonsmoking patients with lung nodules.

1. INTRODUCTION
A study of 2170 lung cancer patients in the United Kingdom found that the proportion of nonsmokers 1 in the United Kingdom with lung cancer was 28% in 2014, up from 13% in 2008. 2 Pelosof et al. 3 found the same results in an American cohort. According to a study in JAMA Oncology, up to 12% of lung cancer patients in the United States have never smoked, and nonsmokers comprise an increasing proportion of lung cancer patients. 4 These studies suggest that the risk for nonsmokers is increasing in several countries.
Siegel et al. 5 found that patients with adenocarcinoma had a higher proportion of never smoking, consistent with the literature. A large cohort of Chinese patients with lung cancer indicated that the proportion of adenocarcinoma was around 60%. 6
Previous studies on the risk factors for lung cancer in nonsmoking patients focused on epidemiological factors such as secondhand smoke and air pollution (increased PM2.5 7 ). There are relatively few clinical prediction models for lung cancer in nonsmoking patients. Establishing a multifactor clinical model to predict the likelihood of lung cancer in patients with pulmonary nodules is critical. Therefore, we identified factors that predict lung nodules from epidemiological data, clinical examination data, and other readily available data to assist in diagnosis and treatment.
2. MATERIALS AND METHODS
2.1. Patients
The Ethics Committee of the Affiliated Lihuili Hospital of Ningbo University, Lihuili hospital, approved the study (number KY2020PJ141). Patients were recruited from the Lihuili Hospital of Ningbo University between October 2020 and February 2022, with the test validation cohort being recruited from April 2022 to June 2022. Eligible patients underwent surgical resections of pulmonary nodules after their diagnosis. Patients were informed, and consent was given to participate. Nonsmokers were identified as having smoked fewer than 100 cigarettes in a lifetime. 1 Inclusion criteria were as follows: (1) pulmonary nodules were found on computed tomography (CT). (2) patients had no significant symptoms before their diagnoses; and (3) patients were well enough to undergo surgery and volunteered for the procedure after having been fully informed. Patients diagnosed with severe physical or cognitive impairments or other severe diseases were excluded. We collected data from medical records, including basic information, disease‐related features on clinical examination, and postoperative pathological results.
2.2. Statistical analysis
Data from patient medical records were analyzed using IBM SPSS Statistics 23.0 and R (v 4.2.1; https://www.R‐project.org). Least absolute shrinkage and selection operator (LASSO) analysis was used to select the optimal predictors associated with lung cancer risk to reduce high‐dimensional datasets among the pulmonary nodule patients. Variables that yielded non‐zero coefficient values in the LASSO regression analysis were retained for model construction. The final prediction model was constructed using univariate logistic regression analysis followed by multivariate logistic regression analysis (all significance levels were two sided). The external testing cohort was used to develop the prediction model, with calibration curves calibrating the nomogram. The model was calibrated when calibration curve results were insignificant. Harrell's C‐index was used to assess model discrimination performance. Decision curve analyses were used to evaluate the clinical utility of this lung cancer risk nomogram by quantifying the net benefit at various probability thresholds, with the net benefit being calculated by subtracting the proportion of patients with false‐positive results from those with true‐positive results and by comparing the relative harm of failing to intervene to the potential adverse outcomes associated with an unnecessary intervention. Receiver operating characteristic (ROC) curves were used to assess the model's precision. The kappa, accuracy, and areas under the curve (AUC) from 10‐fold cross‐validation in the cohorts were used to validate the model's stability and consistency.
3. RESULTS
3.1. Patient characteristics
The training cohort for this study comprised 429 patients with pulmonary nodules who visited our clinic between October 2020 and February 2022. The external testing cohort comprised 123 patients recorded from April to June 2022. Patients in the training cohort (173 males, 256 females; mean age: 58.36 ± 12.49 years [range: 21–86 years]) and the testing cohort (41 males, 82 females; mean age: 56.53 ± 11.27 years [range: 26–78 years]) were separated into the benign and cancer nodule groups. Table 1 displays the demographic and clinical characteristics.
TABLE 1.
Baseline characteristics.
| Variables | Training cohort (N = 429) | Testing cohort(N = 123) | ||||
|---|---|---|---|---|---|---|
| Benign | Cancer | Benign | Cancer | |||
| N (%) | N (%) | p | N (%) | N (%) | p | |
| Location | ||||||
| Left lower lobe | 24 (16.11) | 51 (18.21) | 0.593 | 10 (28.57) | 11 (12.50) | 0.113 |
| Other lobes | 79 (53.02) | 134 (47.86) | 0.430 | 14 (40.00) | 44 (50.00) | 0.049 |
| Right upper lobe | 46 (30.87) | 95 (33.93) | 0.926 | 11 (31.43) | 33 (37.50) | 0.073 |
| Border clear | ||||||
| No | 23 (15.43) | 51 (18.21) | 0.469 | 1 (2.86) | 6 (6.82) | 0.407 |
| Yes | 126 (84.57) | 229 (81.79) | 34 (97.14) | 82 (93.18) | ||
| Spicule sign | ||||||
| No | 132 (88.59) | 249 (88.93) | 0.916 | 35 (100.00) | 76 (86.36) | 0.999 |
| Yes | 17 (11.41) | 31 (11.07) | 0 (0.00) | 12 (13.64) | ||
| Density | ||||||
| High | 96 (64.43) | 76 (27.14) | 0.000 | 32 (91.43) | 27 (30.68) | 0.000 |
| Low | 34 (22.82) | 129 (46.07) | 0.000 | 3 (8.57) | 42 (47.73) | 0.000 |
| Intermediate | 19 (12.75) | 75 (26.79) | 0.000 | 0 (0.00) | 19 (21.59) | 0.998 |
| GGO | ||||||
| No | 119 (79.87) | 107 (38.21) | 0.000 | 33 (94.29) | 32 (36.36) | 0.000 |
| Yes | 30 (20.13) | 173 (61.79) | 2 (5.71) | 56 (64.64) | ||
| Size | ||||||
| <8 mm | 26 (17.45) | 28 (10.00) | 0.029 | 11 (31.43) | 24 (27.27) | 0.010 |
| ≥8 mm | 123 (82.55) | 252 (90.00) | 24 (68.57) | 78 (72.73) | ||
| Blood vessel | ||||||
| No | 141 (94.63) | 247 (88.21) | 0.036 | 35 (100.00) | 88 (100.00) | 0.999 |
| Yes | 8 (5.37) | 33 (11.79) | 0 (0.00) | 0 (0.00) | ||
| Cancer history | ||||||
| No | 141 (94.63) | 271 (96.79) | 0.281 | 35 (100.00) | 82 (93.18) | 0.999 |
| Yes | 8 (4.47) | 9 (3.21) | 0 (0.00) | 6 (6.82) | ||
| Gender | ||||||
| Male | 66 (44.30) | 107 (38.21) | 0.222 | 11 (31.43) | 30 (30.09) | 0.778 |
| Female | 83 (55.70) | 173 (61.79) | 24 (68.57) | 58 (69.91) | ||
| Age (years) | ||||||
| =<65 | 102 (68.46) | 187 (66.79) | 0.725 | 33 (94.27) | 61 (69.32) | 0.009 |
| >65 | 47 (31.54) | 93 (33.21) | 2 (5.71) | 27 (32.67) | ||
| Education (years) | ||||||
| Primary (0–6) | 70 (46.98) | 141 (50.36) | 0.505 | 22 (62.86) | 49 (55.68) | 0.468 |
| Higher (>6) | 79 (53.02) | 139 (49.64) | 13 (38.14) | 39 (44.32) | ||
| Drinking history | ||||||
| No | 131 (87.92) | 239 (85.36) | 0.464 | 35 (100.00) | 88 (100.00) | 0.999 |
| Yes | 18 (12.08) | 41 (14.64) | 0 (0.00) | 0 (0.00) | ||
| Hypertension statue | ||||||
| No | 123 (82.55) | 212 (75.71) | 0.105 | 30 (85.71) | 69 (78.41) | 0.360 |
| Yes | 26 (17.55) | 68 (24.29) | 5 (14.29) | 19 (21.59) | ||
| DM statue | ||||||
| No | 115 (77.18) | 201 (71.79) | 0.228 | 35 (100.00) | 80 (90.90) | 0.999 |
| Yes | 34 (22.82) | 79 (28.21) | 0 (0.00) | 8 (9.10) | ||
| BMI | ||||||
| Normal | 104 (69.80) | 213 (76.07) | 0.160 | 13 (37.14) | 46 (52.27) | 0.132 |
| Abnormal | 45 (30.20) | 67 (23.93) | 22 (62.86) | 42 (47.73) | ||
| INR | ||||||
| Normal | 141 (94.63) | 264 (94.29) | 0.882 | 35 (100.00) | 88 (100.00) | 0.876 |
| Abnormal | 8 (5.37) | 16 (5.71) | 0 (0.00) | 0 (0.00) | ||
| Plasma fibrinogen level | ||||||
| Normal | 126 (84.56) | 202 (72.14) | 0.004 | 25 (71.43) | 65 (73.86) | 0.783 |
| Abnormal | 23 (15.44) | 78 (27.86) | 10 (28.57) | 23 (26.14) | ||
| Plasma albumin | ||||||
| Normal | 87 (58.39) | 152 (54.29) | 0.415 | 20 (57.14) | 49 (55.58) | 0.883 |
| Abnormal | 62 (41.61) | 128 (45.71) | 15 (42.86) | 39 (44.42) | ||
| GGT | ||||||
| Normal | 131 (87.92) | 254 (90.71) | 0.365 | 35 (100.00) | 88 (100.00) | 0.672 |
| Abnormal | 18 (12.08) | 26 (9.29) | 0 (0.00) | 0 (0.00) | ||
| Blood glucose | ||||||
| Normal | 125 (83.89) | 225 (80.36) | 0.369 | 29 (82.86) | 77 (87.50) | 0.502 |
| Abnormal | 24 (16.11) | 55 (19.64) | 6 (17.14) | 11 (12.50) | ||
| CR | ||||||
| Normal | 132 (88.59) | 249 (88.93) | 0.916 | 34 (97.14) | 78 (88.64) | 0.168 |
| Abnormal | 17 (11.41) | 31 (11.07) | 1 (2.86) | 10 (11.36) | ||
| BUN | ||||||
| Normal | 143 (95.97) | 253 (90.36) | 0.044 | 34 (97.14) | 78 (88.64) | 0.168 |
| Abnormal | 6 (4.03) | 27 (9.64) | 1 (2.86) | 10 (11.36) | ||
| SUA | ||||||
| Normal | 128 (85.91) | 224 (79.46) | 0.131 | 28 (80.00) | 74 (84.09) | 0.587 |
| Abnormal | 21 (14.09) | 56 (20.54) | 7 (20.00) | 14 (15.91) | ||
| TG | ||||||
| Normal | 109 (72.63) | 225 (80.00) | 0.088 | 29 (82.86) | 70 (70.55) | 0.676 |
| Abnormal | 40 (27.37) | 55 (20.00) | 6 (17.14) | 18 (29.45) | ||
| TC | ||||||
| Normal | 120 (80.54) | 231 (82.50) | 0.616 | 32 (91.43) | 78 (88.64) | 0.651 |
| Abnormal | 29 (19.46) | 49 (17.50) | 3 (8.57) | 10 (11.36) | ||
| HDL | ||||||
| Normal | 95 (63.76) | 170 (60.71) | 0.537 | 25 (71.43) | 44 (50.00) | 0.033 |
| Abnormal | 54 (36.24) | 110 (39.29) | 10 (28.57) | 44 (50.00) | ||
| LDL | ||||||
| Normal | 129 (86.58) | 248 (88.57) | 0.547 | 31 (88.57) | 78 (88.64) | 0.992 |
| Abnormal | 20 (13.42) | 32 (11.43) | 4 (11.43) | 10 (11.36) | ||
3.2. Variable selection and prediction model development
We considered 27 potentially relevant features for inclusion in the prediction model. Twelve variables were ultimately identified using a LASSO regression analysis of the 239 patients in the training cohort (Figure 1A,B). These variables included spicule sign, density, ground‐glass opacities (GGOs), pulmonary nodule size, blood vessel pass‐through, hypertension, alcohol consumption history, plasma fibrinogen levels, blood glucose, blood urea nitrogen (BUN), serum uric acid, and triglyceride levels. Statistical significances were analyzed using univariate, followed by multivariate logistic regression analyses (Table 2). The CT features and corresponding pathological findings of some pulmonary nodules are shown in Figure 2. The p value of the Hosmer–Lemeshow test statistic was 0.835, indicating no significant difference between the predicted and actual values. A nomogram was developed based on the prediction model incorporating these significant variables to intuitively score by quantifying individual variables (Figure 3).
FIGURE 1.

Clinicopathological variables selection of the LASSO binary logistic regression. (A) The tuning parameter (λ) was selected using five‐fold cross‐validation in the LASSO model based on minimum criteria. The LASSO regression plotted binomial deviance curves against log (λ). The selected optimal λ value was 0.024. (B) LASSO coefficient profiles for 27 potential features and coefficient profile plots against the log (λ) sequence were generated. Twelve features with non‐zero coefficient values were obtained from the optimal lambda results.
TABLE 2.
Univariate and multivariate logistic regression analysis of the variables for lung nodules of patients in the training cohort.
| Variables | Univariate | Multivariate | ||
|---|---|---|---|---|
| OR (95% CI) | p Value | OR (95% CI) | p Value | |
| Density | ||||
| High | – | 0.000 | – | 0.025 |
| Intermediate | 4.99 (2.77–8.96) | 0.000 | 2.50 (1.25–5.00) | 0.010 |
| Low | 4.79 (2.96–7.77) | 0.000 | 1.23 (0.54–2.82) | 0.627 |
| GGO | ||||
| Yes | 6.41 (4.02–10.23) | 0.000 | 6.07 (2.76–13.34) | 0.000 |
| Size | ||||
| ≥8 mm | 1.90 (1.07–3.38) | 0.029 | 2.78 (1.39–5.55) | 0.004 |
| Blood vessel | ||||
| Has | 2.36 (1.06–5.24) | 0.036 | – | – |
| Hypertension | ||||
| Yes | 1.52 (0.92–2.51) | 0.105 | 2.34 (1.31–4.18) | 0.004 |
| Plasma fibrinogen | ||||
| Abnormal | 2.12 (1.26–3.54) | 0.004 | 2.44 (1.36–4.40) | 0.003 |
| BUN | ||||
| Abnormal | 2.54 (1.03–6.31) | 0.044 | 3.02 (1.10–8.27) | 0.031 |
| SUA | ||||
| Abnormal | 1.52 (0.88–2.63) | 0.131 | – | – |
| TG | ||||
| Abnormal | 0.67 (0.42–1.06) | 0.088 | – | – |
Abbreviation: OR, odds ratio.
FIGURE 2.

The CT features (A) and corresponding pathological findings (B) of different pulmonary nodules. The first row: a 52‐year‐old woman with a 16‐mm high‐density nodule which is benign in pathology (hematoxylin/eosin [H&E], ×100); The second row: a 34‐year‐old woman with a 5.7‐mm low‐density nodule which was a carcinoma in situ on pathology (H&E, ×200); The third row: a 62‐year‐old woman with a 10.4‐mm low‐density nodule which was a minimally invasive adenocarcinoma (MIA) on pathology (H&E, ×100); The fourth row: a 63‐year‐old woman with a 7.6‐mm partly solid‐density nodule which was an MIA on pathology (H&E, ×100); The fifth row: a 42‐year‐old woman with a 13‐mm partly solid‐density nodule which was an invasive adenocarcinoma on pathology (H&E, ×100); The sixth row: a 64‐year‐old woman with a 7‐mm high‐density nodule which was an invasive adenocarcinoma on pathology (H&E, ×100).
FIGURE 3.

Lung cancer risk nomogram. This nomogram was developed in the training cohort, incorporating density, GGO, size, hypertension, plasma fibrinogen, and BUN.
Nodule density was defined as low when it exhibited a CT value higher than that of pulmonary tissue but lower than that of pulmonary vessels. Nodules with solid and GGO components were designated as having intermediate density. CT values greater than that of pulmonary vessels indicated high‐density nodules.
GGOs were classified as pure GGOs (pGGOs, n = 187) or mixed GGOs (mGGOs, n = 72). We analyzed the differences between the GGOs in benign and malignant nodules. There were no significant differences between them in univariate analysis (p = 0.192) or multivariate analysis when forced into (p = 0.218) (Table 3). Backward stepwise regression also revealed no differences between benign and malignant nodules. Moreover, nodule size correlates positively with mGGOs when compared with pGGOs.
TABLE 3.
The multivariate logistic regression analysis was performed using the backward likelihood ratio in the training cohort. “Cancer” was eliminated in the second step of the backward stepwise regression.
| Variables | Univariate | Multivariate | ||
|---|---|---|---|---|
| OR (95% CI) | p Value | OR (95% CI) | p Value | |
| Cancer | ||||
| Yes | 1.86 (0.73–4.70) | 0.192 | 1.83 (0.70–4.78) | 0.218 |
| Border clear | ||||
| Yes | 2.02 (0.96–4.25) | 0.065 | – | – |
| Size | ||||
| ≥8 mm | 3.91 (1.34–11.45) | 0.013 | 4.10 (1.39–12.05) | 0.010 |
| Education | ||||
| Primary | 1.65 (0.96–2.86) | 0.073 | – | – |
| BMI | ||||
| Abnormal | 0.47 (0.23–0.94) | 0.032 | 0.45 (0.22–0.90) | 0.025 |
Abbreviation: BMI, body mass index.
3.3. Assessment of predictive risk model performance
Calibration curves for this predictive nomogram revealed it to be well‐calibrated, with a C‐index value of 0.788 (95% confidence interval [CI]: 0.743–0.833) (Figure 4A). The C‐index value for the testing cohort (0.888 [95% CI: 0.835–0.941]) was consistent with the discriminative value of this model, suggesting that it predicts accurately (Figure 4B).
FIGURE 4.

Calibration curves for lung cancer nomogram predictions in the training cohort (A) and external testing cohort (B). The solid line represents the prediction efficacy of the nomogram, and the closer the solid line and diagonal dashed line are, the better the prediction effect will be.
3.4. Analysis of model clinical utility
Decision curve analyses for the predictive nomogram were performed (Figure 5A,B). These analyses revealed a threshold probability of a patient and a doctor from >18 to <90% and >3% in the two cohorts, respectively. This nomogram exhibits value as a means of predicting lung cancer risk. It appeared that the model performed better in the testing cohort. The net benefit was comparable to some overlap within this range when assessing lung cancer risk based on this nomogram.
FIGURE 5.

Decision curve analysis for the model in the two cohorts. The decision curves demonstrated that if the threshold probabilities of a patient and a doctor are from >18 to <90% (A) and >3% (B) in our model for the two cohorts, respectively, using this nomogram to predict the risk of lung cancer is more beneficial than a treat‐all or treat‐none interventional scheme.
3.5. Ten‐fold cross‐validation analysis of the model in the two cohorts
The training cohort underwent a 10‐fold cross‐validation analysis (Table 4). Due to the small sample size of the testing cohort, a resampling method was used to verify the model's validity, with each sampling repeated 50 times (Table 5). The kappa values indicated that the model is stable. The accuracy and AUC suggest that the model shows good predictive accuracy in the training cohort (kappa = 0.416 ± 0.128; accuracy = 0.751 ± 0.056; AUC = 0.768 ± 0.049) and the testing cohort (kappa = 0.290 ± 0.251; accuracy = 0.735 ± 0.098; AUC = 0.843 ± 0.114).
TABLE 4.
Ten‐fold cross‐validation of the model in the training cohort.
| Group | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| Kappa | 0.400 | 0.302 | 0.409 | 0.683 | 0.517 | 0.323 | 0.409 | 0.283 | 0.299 | 0.532 |
| Accuracy | 0.738 | 0.700 | 0.744 | 0.860 | 0.791 | 0.721 | 0.744 | 0.674 | 0.721 | 0.814 |
| AUC | 0.747 | 0.735 | 0.767 | 0.864 | 0.808 | 0.711 | 0.724 | 0.745 | 0.751 | 0.824 |
Note: This table shows values of kappa, accuracy and AUC value for the training cohort. The final average from 10 groups are calculated (Kappa = 0.416 ± 0.128; Accuracy = 0.751 ± 0.056; AUC value = 0.768 ± 0.049).
TABLE 5.
Ten‐fold cross‐validation of the model in the testing cohort.
| Group | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| Kappa | 0.000 | 0.273 | 0.750 | 0.049 | 0.333 | 0.421 | 0.316 | 0.161 | 0.552 | 0.000 |
| Accuracy | 0.750 | 0.667 | 0.917 | 0.538 | 0.667 | 0.818 | 0.769 | 0.692 | 0.769 | 0.750 |
| AUC | 0.747 | 0.735 | 0.767 | 0.864 | 0.808 | 0.711 | 0.724 | 0.745 | 0.751 | 0.824 |
Note: This table shows values of kappa, accuracy and AUC value for the test cohort. The final average from 10 groups are calculated (Kappa = 0.290 ± 0.251; Accuracy = 0.735 ± 0.098; AUC value = 0.843 ± 0.114).
3.6. Binomial and multinomial logistic regression analysis for minimally invasive adenocarcinoma and invasive adenocarcinoma (IAC)
Invasive adenocarcinoma (IAC) accounted for 84.38%, and other types of lung cancer accounted for only 15.62%. Invasive adenocarcinoma includes minimally invasive adenocarcinoma (MIA) and IAC according to the degree of their invasion; therefore, it is necessary to understand the context of each factor in the model in the various types of lung adenocarcinoma. Table 6 displays the results of the multinomial logistic regression models evaluating the associations among factors and MIA and IAC. The variables in the model were associated with MIA and IAC as follows: density (odds ratio [OR] 5.39 [95% CI: 2.21–13.14] vs. 2.32 [95% CI: 1.10–4.89]), GGO (OR 8.49 [95% CI: 3.62–19.92] vs. 4.69 [95% CI: 2.06–10.70]), and nodule size (OR 2.62 [95% CI: 1.19–5.75] vs. 6.38 [95% CI: 2.59–15.67]). Compared with high‐density pulmonary nodules, patients with intermediate‐density or mixed‐density pulmonary nodules had a significant risk of MIA and IAC, and the risk of MIA was higher. A similar trend was seen for GGO. The degree of adenocarcinoma infiltration may be higher when the pulmonary nodule size is ≥8 mm. In addition to these variables, plasma fibrinogen levels and BUN were risk factors for MIA, and hypertension was a risk factor for IAC in the multivariable models.
TABLE 6.
Odds ratios of different types of invasive adenocarcinomas by related factors compared with cases without lung cancer using multinomial logistic regression model.
| Variables | MIA (n = 149) | IAC (n = 159) | ||||||
|---|---|---|---|---|---|---|---|---|
| Univariable OR (95% CI) | p | Multivariable OR (95% CI) | p | Univariable OR (95% CI) | p | Multivariable OR (95% CI) | p | |
| Density | – | 0.001 | – | – | – | 0.060 | – | – |
| Low | 3.22 (1.13–9.17) | 0.029 | 3.10 (1.17–8.25) | 0.023 | 1.06 (0.43–2.60) | 0.898 | 1.02 (0.43–2.44) | 0.962 |
| Intermediate | 5.62 (2.21–14.32) | 0.000 | 5.39 (2.21–13.14) | 0.000 | 2.27 (1.06–4.84) | 0.034 | 2.32 (1.10–4.89) | 0.028 |
| GGO (yes) | 8.16 (3.31–20.17) | 0.000 | 8.49 (3.62–19.92) | 0.000 | 4.51 (1.94–10.48) | 0.000 | 4.69 (2.06–10.70) | 0.000 |
| Size (≥8 mm) | 2.91 (1.24–6.82) | 0.014 | 2.62 (1.19–5.75) | 0.017 | 7.12 (2.78–18.17) | 0.000 | 6.38 (2.59–15.67) | 0.000 |
| Hypertension (has) | 0.95 (0.42–2.23) | 0.971 | 1.14 (0.56–2.33) | 0.716 | 3.94 (2.20–7.07) | 0.000 | 3.63 (2.07–6.35) | 0.000 |
| abPF | 2.9 (1.43–6.00) | 0.003 | 2.48 (1.34–4.58) | 0.004 | 1.60 (0.87–2.95) | 0.134 | 1.56 (0.87–2.81) | 0.135 |
| abSUN | 3.74 (1.21–11.59) | 0.022 | 3.91 (1.40–10.93) | 0.009 | 1.69 (0.57–5.00) | 0.346 | 2.10 (0.78–5.62) | 0.142 |
Abbreviations: abPF, abnormal plasma fibrinogen levels; abSUN, abnormal blood urea nitrogen.
3.7. ROC curve analysis
ROC curve analysis revealed that the performance of the two cohorts showed good predictive value. The AUC value (the blue line which stands for our model) of the training cohort was 0.788 (95% CI: 0.742–0.833) (Figure 6A), and that of the testing cohort was 0.888 (95% CI: 0.833–0.943) (Figure 6B). While the AUC value of the yellow line which stands for parsimonious Brock model. 8 (AUC, 0.568; 95% CI:0.510–0.626 in Figure 6A and AUC, 0.704; 95% CI:0.603–0.805 in Figure 6B) and the red line which stands for Farjah's model. 9 (AUC, 0.581; 95% CI:0.524–0.639 in Figure 6A and AUC, 0.768; 95% CI:0.676–0.859 in Figure 6B) were significantly smaller.
FIGURE 6.

Receiver operating characteristic curve analysis for the training cohort (A) and testing cohort (B). The yellow line stands for parsimonious Brock model (AUC, 0.568; 95% CI:0.510–0.626 in A and AUC, 0.704; 95% CI:0.603–0.805 in B), the red line stands for Farjah's model (AUC, 0.581; 95% CI:0.524–0.639 in A and AUC, 0.768; 95% CI:0.676–0.859 in B) and the blue line stands for our model (AUC, 0.788; 95% CI: 0.742–0.833 in A and 0.888; 95% CI: 0.833–0.943 in B).
4. DISCUSSION
A study of East Asians found higher lung cancer rates in women, a group in which smoking is rare. 10 The risks for nonsmokers might be associated with being of East Asian descent, female, and having a history of lung cancer adenocarcinoma. 11 A study suggested a possible association with genomic landscape changes such as vascular endothelial growth factor. 12 Currently, there are no prediction models integrating epidemiological factors and clinical examination indicators to assist clinicians in diagnosing and treating pulmonary nodules. Moreover, for variables such as GGO, the surgical criteria are not well defined, and treatments are often based on the experience of the operating surgeons. 13 , 14 In this context, we sought to develop a nomogram that predicts the relative risk of malignancy when evaluating patients with pulmonary nodules.
We developed and validated a prediction model for nonsmoking patients with lung nodules by analyzing and identifying relevant variables. The model assessed the risk of benign or malignant disease for a given lung nodule by analyzing data from patients undergoing pulmonary nodule resection. The model demonstrated good accuracy, calibration, discrimination, and clinical efficacy in our training and testing cohorts. This finding suggests that the model is accurate, and the nomogram can accurately assess the risk of malignant pulmonary nodules. 15 Ten‐fold cross‐validation validated the stability and consistency of the model.
Previous studies showed that hypertension is common in patients with cancer, 16 including non‐small cell lung cancer. 17 One explanation for this phenomenon is that hypertension might increase plasma vascular endothelial growth factor levels. 18 This phenomenon might explain the importance of hypertension in our model. The combination of fibrinogen β‐chain and fibrinogen γ‐chain had good sensitivity and specificity when predicting benign and malignant lung nodules. 19 Using multivariable‐adjusted Cox regression models, Grafetstätter et al. 20 found that fibrinogen was significantly related to lung cancer. We included plasma fibrinogen in our model as a risk factor for nodular lung malignancy. Chang et al. identified BUN as a risk factor for lung cancer and included it in their lung cancer assessment model. 21 Another study demonstrated that the ratio of BUN to serum albumin could be used to predict outcomes in patients with severe lung cancer. 22 The remaining three variables were common risk factors in the study.
The univariate and multivariate logistic regression analyses showed that, compared with high‐density nodules, intermediate‐density nodules had a higher risk of lung cancer. This finding was consistent with the qualitative density (nonsolid and part‐solid) identified by Cui et al. 23 Partially solid nodules have a high likelihood of malignancy, according to MacMahon et al. 24 However, when further comparing pGGOs with mGGOs to identify their difference in predicting the overall lung cancer risk, we found that there was no significance between them (Table 3).
We refined the concept of lung cancer and classified it into four categories based on pathological findings and proportions: carcinoma in situ, MIA, IAC, and other types of lung cancer. Because of the close relationship between MIA and IAC, we analyzed the differences between each variable in the model using binomial and multinomial logistic regression (Table 6). We found that partly solid GGOs have a higher MIA possibility than IAC compared to benign pulmonary nodules. This trend was similar to what was reported by Lee et al. 25 Furthermore, the higher OR suggested that a larger nodule indicates a higher degree of adenocarcinoma invasion. Zhan et al. 26 also found that the size of IAC was significantly more extensive than that of lung adenocarcinomas in situ and MIA. The size of pulmonary nodules is essential for assessing GGO invasiveness. One study indicated that the optimal predictor for invasive pGGOs was 10 mm for the cutoff value. 27 Farjah et al. 8 presented a model with AUC values of 0.75 (95% CI: 0.72–0.80), including age, gender, body mass index, smoking history, size, and location. Another study found that the parsimonious version of the Brock model had a relatively good predictive accuracy (AUC, 0.713; 95% CI: 0.702–0.724); however, this model did not perform as well as ours in the two cohorts. 9 On the other hand, the result might indicate their models were limited when used for predicting nonsmokers with lung nodules. However, those models were applied to the entire population of patients with pulmonary nodules and failed to include some essential variables like GGOs and laboratory tests indexes that are readily available in clinical practice and might be helpful for clinicians.
Early interventions such as family support, biochemical analysis of blood samples, and CT scanning benefit low‐risk patients, while regular follow‐up can ensure the monitoring of pulmonary nodules, timely detection, and treatment of condition changes. Accurate lung nodule assessment can help surgeons determine the lung cancer risk and ensure timely treatment for high‐risk patients while reducing invasive treatment for low‐risk patients. It is challenging to accurately predict the risk of lung cancer in a specific patient. The best solutions involve models based on epidemiological and clinicopathological data.
5. LIMITATIONS
The research has several limitations. There were only 19 lung cancer cases with lymph node metastasis, which is far less than the number of patients without metastasis. With increasing nodule size, the likelihood of malignancy increases, and it is a reasonable guess that size might positively correlate with the likelihood of lymph node metastasis. However, we could not further analyze the differences considering the power of the test. In addition, because ours was a single‐center retrospective study and despite the adequate predictive performance, our model might not be sufficient to ensure the generalization of these data to patients in other countries or regions. Therefore, multi‐center, multi‐regional studies are required.
6. CONCLUSIONS
Our model was accurate, stable, and well‐calibrated when predicting cancer risk in nonsmoking patients with lung nodules. The resulting nomogram can assist clinicians in diagnosing and treating using quantitative scoring.
AUTHOR CONTRIBUTIONS
Z.F.L. and G.F.S. made all contributions to the design and conception for the study; R.J.Z. analyzed the data; Z.F.L. and G.F.S. wrote and revised the manuscript.
FUNDING INFORMATION
This work was subsidized by Natural Science Foundation of Ningbo Municipality (202003N4269, 2019C50069), the grants of Basic Public Welfare Projects in Zhejiang province (LGF19H020004), Zhejiang Province Medical and Health Project (2017ZD026, 2020KY273), Ningbo Health Branding Subject Fund (PPXK2018‐01).
CONFLICT OF INTEREST
The authors declare no competing interests.
ACKNOWLEDGMENTS
Special thanks to all participants who took part in this study.
Liao Z, Zheng R, Shao G. A lung cancer risk prediction model for nonsmokers: A retrospective analysis of lung nodule cohorts in China. J Clin Lab Anal. 2022;36:e24748. doi: 10.1002/jcla.24748
Zufang Liao and Rongjiong Zheng contributed equally to this study.
DATA AVAILABILITY STATEMENT
All data are fully available from the corresponding author upon reasonable request.
REFERENCES
- 1. Rivera GA, Wakelee H. Lung cancer in never smokers. Adv Exp Med Biol. 2016;893:43‐57. doi: 10.1007/978-3-319-24223-1_3 [DOI] [PubMed] [Google Scholar]
- 2. Cufari ME, Proli C, De Sousa P, et al. Increasing frequency of non‐smoking lung cancer: presentation of patients with early disease to a tertiary institution in the UK. Eur J Cancer. 2017;84:55‐59. doi: 10.1016/j.ejca.2017.06.031 [DOI] [PubMed] [Google Scholar]
- 3. Pelosof L, Ahn C, Gao A, et al. Proportion of never‐smoker non‐small cell lung cancer patients at three diverse institutions. J Natl Cancer Inst. 2017;109(7):djw295. doi: 10.1093/jnci/djw295 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Siegel DA, Fedewa SA, Henley SJ, Pollack LA, Jemal A. Proportion of never smokers among men and women with lung cancer in 7 US states. JAMA Oncol. 2021;7(2):302‐304. doi: 10.1001/jamaoncol.2020.6362 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Subramanian J, Govindan R. Lung cancer in never smokers: a review. J Clin Oncol. 2007;25(5):561‐570. doi: 10.1200/JCO.2006.06.8015 [DOI] [PubMed] [Google Scholar]
- 6. Zhang L, Li M, Wu N, Chen Y. Time trends in epidemiologic characteristics and imaging features of lung adenocarcinoma: a population study of 21,113 cases in China. PLoS One. 2015;10(8):e0136727. doi: 10.1371/journal.pone.0136727 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Tseng CH, Tsuang BJ, Chiang CJ, et al. The relationship between air pollution and lung cancer in nonsmokers in Taiwan. J Thorac Oncol. 2019;14(5):784‐792. doi: 10.1016/j.jtho.2018.12.033 [DOI] [PubMed] [Google Scholar]
- 8. Farjah F, Monsell SE, Greenlee RT, et al. Patient and nodule characteristics associated with a lung cancer diagnosis among individuals with incidentally detected lung nodules. Chest. 2022;3692(22):03900‐9. doi: 10.1016/j.chest.2022.09.030. Epub ahead of print. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Vachani A, Zheng C, Amy Liu IL, Huang BZ, Osuji TA, Gould MK. The probability of lung cancer in patients with incidentally detected pulmonary nodules: clinical characteristics and accuracy of prediction models. Chest. 2022;161(2):562‐571. doi: 10.1016/j.chest.2021.07.2168 [DOI] [PubMed] [Google Scholar]
- 10. Ferlay J, Soerjomataram I, Dikshit R, et al. Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012. Int J Cancer. 2015;136(5):E359‐E386. doi: 10.1002/ijc.29210 [DOI] [PubMed] [Google Scholar]
- 11. Toh CK, Gao F, Lim WT, et al. Never‐smokers with lung cancer: epidemiologic evidence of a distinct disease entity. J Clin Oncol. 2006;24(15):2245‐2251. doi: 10.1200/JCO.2005.04.8033 [DOI] [PubMed] [Google Scholar]
- 12. Thu KL, Vucic EA, Chari R, et al. Lung adenocarcinoma of never smokers and smokers harbor differential regions of genetic alteration and exhibit different levels of genomic instability. PLoS One. 2012;7(3):e33003. doi: 10.1371/journal.pone.0033003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Migliore M, Fornito M, Palazzolo M, et al. Ground glass opacities management in the lung cancer screening era. Ann Transl Med. 2018;6(5):90. doi: 10.21037/atm.2017.07.28 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Lococo F, Cusumano G, Cardillo G, SICT PNR‐Working Group . It's unnecessary to perform N1‐N2 sampling/dissection in predominantly‐GGO cStage‐I lung cancer? Ann Thorac Surg. 2021;111(4):1405‐1406. doi: 10.1016/j.athoracsur.2020.05.168 [DOI] [PubMed] [Google Scholar]
- 15. Wei L, Champman S, Li X, et al. Beliefs about medicines and non‐adherence in patients with stroke, diabetes mellitus and rheumatoid arthritis: a cross‐sectional study in China. BMJ Open. 2017;7(10):e017293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Wong BS, Chiu LY, Tu DG, Sheu GT, Chan TT. Anticancer effects of antihypertensive l‐type calcium channel blockers on chemoresistant lung cancer cells via autophagy and apoptosis. Cancer Manag Res. 2020;13(12):1913‐1927. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Yilmaz A, Mohamed N, Patterson KA, et al. Clinical and metabolic parameters in non‐small cell lung carcinoma and colorectal cancer patients with and without KRAS mutations. Int J Environ Res Public Health. 2014;11(9):8645‐8660. doi: 10.3390/ijerph110908645 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Yang P, Deng W, Han Y, et al. Analysis of the correlation among hypertension, the intake of β‐blockers, and overall survival outcome in patients undergoing chemoradiotherapy with inoperable stage III non‐small cell lung cancer. Am J Cancer Res. 2017;7(4):946‐954. [PMC free article] [PubMed] [Google Scholar]
- 19. Kuang M, Peng Y, Tao X, et al. FGB and FGG derived from plasma exosomes as potential biomarkers to distinguish benign from malignant pulmonary nodules. Clin Exp Med. 2019;19(4):557‐564. [DOI] [PubMed] [Google Scholar]
- 20. Grafetstätter M, Hüsing A, González Maldonado S, et al. Plasma Fibrinogen and sP‐Selectin are associated with the risk of lung cancer in a prospective study. Cancer Epidemiol Biomark Prev. 2019;28(7):1221‐1227. [DOI] [PubMed] [Google Scholar]
- 21. Matsuguma H, Yokoi K, Anraku M, et al. Proportion of ground‐glass opacity on high‐resolution computed tomography in clinical T1 N0 M0 adenocarcinoma of the lung: a predictor of lymph node metastasis. J Thorac Cardiovasc Surg. 2002;124(2):278‐284. doi: 10.1067/mtc.2002.122298 [DOI] [PubMed] [Google Scholar]
- 22. Heo EY, Lee KW, Jheon S, Lee JH, Lee CT, Yoon HI. Surgical resection of highly suspicious pulmonary nodules without a tissue diagnosis. Jpn J Clin Oncol. 2011;41(8):1017‐1022. doi: 10.1093/jjco/hyr073 [DOI] [PubMed] [Google Scholar]
- 23. Cui X, Heuvelmans MA, Fan S, et al. A subsolid nodules imaging reporting system (SSN‐IRS) for classifying 3 subtypes of pulmonary adenocarcinoma. Clin Lung Cancer. 2020;21(4):314‐325.e4. doi: 10.1016/j.cllc.2020.01.014 [DOI] [PubMed] [Google Scholar]
- 24. MacMahon H, Naidich DP, Goo JM, et al. Guidelines for management of incidental pulmonary nodules detected on ct images: from the fleischner society 2017. Radiology. 2017;284(1):228‐243. doi: 10.1148/radiol.2017161659 [DOI] [PubMed] [Google Scholar]
- 25. Lee JH, Park CM, Lee SM, Kim H, McAdams HP, Goo JM. Persistent pulmonary subsolid nodules with solid portions of 5 mm or smaller: Their natural course and predictors of interval growth. Eur Radiol. 2016;26(6):1529‐1537. doi: 10.1007/s00330-015-4017-4 [DOI] [PubMed] [Google Scholar]
- 26. Zhan Y, Peng X, Shan F, et al. Attenuation and morphologic characteristics distinguishing a ground‐glass nodule measuring 5‐10 mm in diameter as invasive lung adenocarcinoma on thin‐slice CT. AJR Am J Roentgenol. 2019;213(4):W162‐W170. doi: 10.2214/AJR.18.21008 [DOI] [PubMed] [Google Scholar]
- 27. Lee SM, Park CM, Goo JM, Lee HJ, Wi JY, Kang CH. Invasive pulmonary adenocarcinomas versus preinvasive lesions appearing as ground‐glass nodules: differentiation by using CT features. Radiology. 2013;268(1):265‐273. doi: 10.1148/radiol.13120949 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
All data are fully available from the corresponding author upon reasonable request.
