Skip to main content
Chest logoLink to Chest
. 2022 Sep 30;163(3):719–730. doi: 10.1016/j.chest.2022.09.030

Patient and Nodule Characteristics Associated With a Lung Cancer Diagnosis Among Individuals With Incidentally Detected Lung Nodules

Farhood Farjah a,, Sarah E Monsell b, Robert T Greenlee d, Michael K Gould e, Rebecca Smith-Bindman f, Matthew P Banegas g, Kurt Schoen d, Arvind Ramaprasan c, Diana SM Buist c
PMCID: PMC10154904  PMID: 36191633

Abstract

Background

Pulmonary nodules are a common incidental finding on CT imaging. Few studies have described patient and nodule characteristics associated with a lung cancer diagnosis using a population-based cohort.

Research Question

Does a relationship exist between patient and nodule characteristics and lung cancer among individuals with incidentally detected pulmonary nodules, and can this information be used to create exploratory lung cancer prediction models with reasonable performance characteristics?

Study Design and Methods

We conducted a retrospective cohort study of adults older than 18 years with lung nodules of any size incidentally detected by chest CT imaging between 2005 and 2015. All patients had at least 2 years of complete follow-up. To evaluate the relationship between patient and nodule characteristics and lung cancer, we used binomial regression. We used logistic regression to create prediction models, and we internally validated model performance using bootstrap optimism correction.

Results

Among 7,240 patients with a median age of 67 years, 56% of whom were women, with a median BMI of 28 kg/m2, 56% of whom were ever smokers, 31% of whom had prior nonlung malignancy, with a median nodule size 5.6 mm, 57% of whom had multiple nodules, and 40% of whom had an upper lobe nodule, 265 patients (3.7%; 95% CI, 3.2%-4.1%) had a diagnosis of lung cancer. In a multivariate analysis, age, sex, BMI, smoking history, and nodule size and location were associated with a lung cancer diagnosis, whereas prior malignancy and nodule number and laterality were not. We were able to construct two prediction models with an area under the curve value of 0.75 (95% CI, 0.72-0.80) and reasonable calibration.

Interpretation

Lung cancer is uncommon among individuals with incidentally detected lung nodules. Some, but not all, previously identified factors associated with lung cancer also were associated with this outcome in this sample. These findings may have implications for clinical practice, future practice guidelines, and the development of novel lung cancer prediction models for individuals with incidentally detected lung nodules.

Key Words: imaging, incidental findings, lung cancer, lung nodule

Graphical Abstract

graphic file with name fx1.jpg


Take-home Points.

Study Question: What patient and nodule characteristics are associated with lung cancer diagnosis in a population-based cohort of adults 18 years of age or older with incidentally detected pulmonary nodules?

Results: Among 7,240 individuals with an incidentally detected lung nodule—of whom 265 (3.7%; 95% CI, 3.2%-4.1%) received a lung cancer diagnosis—age, sex, BMI, smoking history, and nodule size and location were associated with lung cancer, but prior nonlung malignancy and nodule laterality and number were not.

Interpretation: A better understanding of patient and nodule characteristics associated with a lung cancer diagnosis in a population-based cohort of individuals with incidentally detected pulmonary nodules may have implications for patient care, future iterations of practice guidelines, and the development and validation of novel lung cancer prediction models.

More than 1.6 million adults are identified as having an incidentally detected lung nodule each year in the United States.1 Although approximately 95% of nodules are benign, patients and clinicians worry about the diagnosis of lung cancer.1, 2, 3, 4, 5 Practice guidelines intend to maximize the benefits of early detection of cancer and to minimize the harms of diagnostic testing by varying the intensity of the evaluation based on the probability of lung cancer.6,7 This probability is either estimated using prediction models8 or is inferred based on patient and nodule characteristics.6,7

Until recently, no population-based data on individuals with incidentally detected lung nodules were readily available. Gaps in knowledge have been bridged by expert opinion, studies of select populations with a high prevalence of lung cancer,9, 10, 11, 12, 13, 14, 15 or extrapolation from studies of individuals with screen-detected lung nodules.16, 17, 18 Consequently, clinicians and practice guidelines have had to make recommendations with limited evidence, and prediction models may not perform consistently across various care settings and populations.

We undertook this study to describe the clinical epidemiologic features of incidentally detected lung nodules using a population-based cohort. A secondary hypothesis-generating aim was to develop exploratory prediction models using these data.

Study Design and Methods

Study Design and Data Source

We conducted a retrospective cohort study of adults 18 years of age or older with ≥ 1 incidentally detected lung nodules found by CT scan imaging between 2005 and 2015. Study participants included patients from Kaiser Permanente Washington and Marshfield Clinic in Wisconsin. These health systems have electronic health records and harmonized variable definitions for demographics, smoking history, health care use, cancer incidence, and vital status.19 Institutional review boards ceded oversight to Kaiser Permanente Washington, which approved the study and a waiver of individual consent (Identifier: 1042944). This article adheres to Strengthening the Reporting of Observational Studies in Epidemiology reporting guidelines (e-Table 1).20

Study Population

We used natural language processing to screen chest CT scan free-text radiology reports for documentation of a lung nodule.1,21,22 Next, trained abstractors: (1) confirmed that a radiologist documented a new lung nodule (not including calcified nodules consistent with granulomas), (2) recorded the indication for CT scan imaging, and (3) collected information about documented findings and the radiologist’s impression.2,22 The base cohort (N = 10,329) included patients with an incidentally detected lung nodule who had undergone chest CT scan imaging for reasons other than lung cancer screening, lung cancer surveillance, or follow-up of a previously identified nodule. We excluded patients with radiographic evidence of locally advanced or metastatic lung cancer (n = 669), fever as an indication for imaging (n = 118), or a documented radiologist concern for infection as the cause of the nodule (n = 930). Because we sought a case complete analysis (discussed herein), we excluded patients with missing BMI (n = 1,181), missing smoking history (n = 6), and missing nodule location (n = 185) (e-Table 2).

Patient and Nodule Characteristics

We defined age as age at first nodule detection. We measured BMI if it was recorded in the medical record within 1 year of the first nodule detection. To determine a history of prior malignancy (other than lung cancer, which was an exclusion criterion), we used either an entry within each health system’s cancer registry or documentation within free-text radiology reports indicating a history of prior cancer. Both health systems collected information on patient smoking history. We defined nodule size and location based on the most concerning nodule noted in the radiology report, and we considered enumeration of or language referring to > 1 nodule to constitute multiple nodules. We selected covariates with previously described associations with lung cancer.9, 10, 11, 12, 13, 14, 15,18,23,24 We measured race and ethnicity to describe the generalizability of the patient population, but we do not consider these variables to be biological factors that lead to lung cancer.

Ascertainment of a Lung Cancer Diagnosis and Vital Status

We used the Western Washington Surveillance Epidemiology and End-Results Registry and the Wisconsin Cancer Reporting System to identify lung cancer diagnoses (International Classification of Diseases for Oncology codes C340-C349 excluding histology codes 9590-9989, 9050-9055, and 9140). Linkages to these registries allowed us to ascertain lung cancer diagnoses longitudinally, independent of whether the patient remained enrolled in the health system. Additionally, both health systems link their records to state death registries, which allowed us to measure vital status independent of enrollment status. Because the recommended duration of surveillance for solid lung nodules is 2 years,6,7 we measured lung cancer diagnoses that occurred within 2 years of nodule detection. All patients underwent 2 years of follow-up unless they died within 2 years of nodule diagnosis.

Descriptive Statistics and Analysis

We described patient and nodule characteristics using frequencies and percentages for categorical variables and mean ± SD and median (interquartile range) for continuous variables. To explore the potential impact of the competing risk of death, we calculated the probability of lung cancer using: (1) a time-to-event estimate of the 2-year incidence of lung cancer diagnosis that accounted for the competing risk of death,25 (2) a Kaplan-Meier time-to-event estimate of the 2-year cumulative incidence of lung cancer diagnoses censoring patients who died within 2 years of lung nodule detection and without a lung cancer diagnosis, or (3) a proportion in which the numerator included all patients with a lung cancer diagnosis and the denominator consisted of the entire cohort regardless of vital status. We used the Wilson score interval to estimate 95% CIs for proportions.26 Because the probability of lung cancer was similar across estimation methods (competing risk, 3.7% [95% CI, 3.2%-4.1%]; Kaplan-Meier, 3.8% [95% CI, 3.3%-4.2%]; proportion, 3.7% [95% CI, 3.2%-4.1%]), we used proportions for our analyses. To understand the univariate and multivariate relationships between patient and nodule factors and the probability of lung cancer, we used binomial regression with a log link. For continuous variables in the multivariate analysis, we used natural splines with knots at observed quartiles. Because it is difficult to interpret parameter estimates for continuous variables modelled as splines, we used a likelihood ratio test to compare models with and without the continuous variable to determine if the variable led to improved model fit as a marker of association. No outcome data were missing in this study, although covariate data were missing for 1,372 patients. The distributions of measured variables were similar between those with and without missing data with two exceptions: missing data occurred more commonly in health system A and occurred less frequently over time (e-Tables 3, 4). No differences were found in the probability of lung cancer between those with and without missing covariate data (3.9% [95% CI, 3.0%-5.1%] vs 3.7% [95% CI, 3.3%-4.1%]). We performed a case complete analysis.

Prediction Model Development and Internal Validation

We developed exploratory prediction models with varying degrees of parsimony (e-Table 5). The full model included all measured patient and nodule characteristics based on a priori knowledge of a relationship with lung cancer,9, 10, 11, 12, 13, 14, 15,18,23,24 whereas the significant factors model included only variables with a statistically significant relationship with the probability of lung cancer in the multivariate analysis. We also developed a parsimonious model based on factors with high magnitude associations with the outcome. Pepe et al27 demonstrated that even biomarkers with an OR as high as 3 can be a poor classifier of outcomes. Based on this observation, we selected variables for inclusion in the parsimonious prediction model if the lower bound of the 95% CI from our univariate analysis was > 3.0. We assessed model performance by evaluating discrimination and calibration. To safeguard against overly optimistic results when using the full data set to estimate model coefficients, we used a bootstrap optimism correction (200 samples) to estimate the area under the curve (AUC) and nonparametric 95% CIs (500 replicates of each of the 200 samples).28,29 Similarly, we used optimism correction to create calibration plots showing observed vs predicted probability of lung cancer. Analyses were performed using R version 4.0.3 software (R Foundation for Statistical Computing).

Results

Among 7,240 individuals with incidentally detected lung nodules (Table 1), the median age was 67 years and 56% were women. Most patients were White (88%) and non-Hispanic (95%). The median BMI was 28 kg/m2. Thirty-one percent of patients had a history of a prior nonlung malignancy. Fifty-six percent formerly or currently smoked at the time of nodule detection. Median and mean nodule sizes were 5.6 mm and 7.3 mm, respectively. Thirty-four percent had nodules of ≤ 4 mm, and 26% had nodules of > 8 mm. Fifty-seven percent had multiple nodules, and 40% had an upper lobe nodule. Within 2 years of lung nodule detection, 265 patients (3.7%; 95% CI, 3.2%-4.1%) received a diagnosis of lung cancer.

Table 1.

Characteristics of Individuals With Incidentally Detected Lung Nodules (n = 7,240)

Variable Data
Age, y
 Median (IQR) 67 (19)
 Mean ± SD 66 ± 14
 < 35 194 (3)
 35-44 344 (5)
 45-54 883 (12)
 55-64 1,726 (24)
 65-74 1,999 (28)
 75-84 1,528 (21)
 85+ 566 (8)
Sex
 Female 4,042 (56)
 Male 3,198 (44)
Race
 White 6,340 (88)
 Black 181 (3)
 Asian 304 (4)
 American Indian/Alaska Native 37 (1)
 Native Hawaiian/Pacific Islander 19 (0)
 Other 74 (1)
 Multirace 105 (1)
 Missing 180 (2)
Ethnicity
 Non-Hispanic 6,866 (95)
 Hispanic 191 (3)
 Missing 183 (3)
BMI, kg/m2
 Median (IQR) 28 (8)
 Mean ± SD 29 ± 6
 Underweight (< 18.5) 120 (2)
 Normal (18.5-24.9) 1,905 (26)
 Overweight (25-29.9) 2,435 (34)
 Obesity (≥ 30) 2,780 (38)
Prior nonlung malignancy
 No 4,981 (69)
 Yes 2,259 (31)
Smoking history
 Never 3,206 (44)
 Ever (former or current) 4,034 (56)
Nodule size (dominant lesion), mm
 Median (IQR) 5.6 (5)
 Mean ± SD 7.3 ± 5.4
 ≤ 4 2,438 (34)
 > 4-≤ 6 1,919 (27)
 > 6-≤ 8 1,003 (14)
 > 8-≤ 10 557 (8)
 > 10-≤ 20 1,030 (14)
 > 2-≤ 30 293 (4)
No. of nodules
 Single 3,102 (43)
 Multiple 4,138 (57)
Nodule location (dominant lesion)
 Right upper lobe 1,740 (24)
 Right middle lobe 1,019 (14)
 Right lower lobe 1,572 (22)
 Right upper and lower lobes 16 (0)
 Right upper and middle lobes 19 (0)
 Right middle and lower lobes 9 (0)
 Right side not otherwise specified 177 (2)
 Left upper lobe 1,164 (16)
 Left lower lobe 1,412 (20)
 Left upper and lower lobes 12 (0)
 Left side not otherwise specified 100 (1)
Health system
 A 2,325 (32)
 B 4,915 (68)
Year of nodule diagnosis
 2005 310 (4)
 2006 485 (7)
 2007 543 (8)
 2008 555 (8)
 2009 572 (8)
 2010 578 (8)
 2011 710 (10)
 2012 793 (11)
 2013 860 (12)
 2014 918 (13)
 2015 916 (13)

Data are presented as No. (%), mean ± SD, or median (interquartile range). IQR = interquartile range.

The 2-year probability of lung cancer varied by age, smoking status, nodule size, nodule location (upper vs nonupper lobe), and health system. The 2-year probability of lung cancer was higher than the overall population mean for individuals who had ever smoked (5.5%; 95% CI, 4.9%-6.3%), with nodules of > 10 mm to ≤ 20 mm (12.3%; 95% CI, 10.5%-14.5%), with nodules of > 20 mm to ≤ 30 mm (25.3%; 95% CI, 20.6%-30.5%), or with upper lobe nodules (5.3%; 95% CI, 4.5%-6.2%). The probability of lung cancer did not vary by sex, BMI, history of a prior nonlung malignancy, nodule number or laterality, or year of nodule diagnosis (Table 2). When evaluated as continuous variables, age and BMI seemed to have a parabolic relationship (U shape) with the 2-year probability of lung cancer, whereas nodule size had a sigmoid relationship (S shape) with the probability of lung cancer (Fig 1). An examination of univariate associations on the multiplicative scale revealed similar relationships with one exception: year of nodule diagnosis was related inversely to the probability of lung cancer (Table 3).

Table 2.

Probability of Lung Cancer by Patient and Nodule Characteristics

Variable Lung Cancer Diagnoses Within 2 Y of Nodule Detection Probability of Lung Cancer (95% CI), %
Age, y
 < 35 0 0.0 (0.0-1.9)
 35-44 1 0.3 (0.1-1.6)
 45-54 20 2.3 (1.5-3.5)
 55-64 51 3.0 (2.3-3.9)
 65-74 91 4.6 (3.7-5.6)
 75-84 78 5.1 (4.1-6.3)
 85+ 24 4.2 (2.9-6.2)
Sex
 Female 156 3.9 (3.3-4.5)
 Male 109 3.4 (2.8-4.1)
BMI, kg/m2
 Underweight (< 18.5) 5 4.2 (1.8-9.4)
 Normal (18.5-24.9) 77 4.0 (3.2-5.0)
 Overweight (25-29.9) 95 3.9 (3.2-4.7)
 Obesity (≥ 30) 88 3.2 (2.6-3.9)
Prior nonlung malignancy
 No 186 3.7 (3.2-4.3)
 Yes 79 3.5 (2.8-4.3)
Smoking status
 Never 42 1.3 (1.0-1.8)
 Ever (former or current) 223 5.5 (4.9-6.3)
Nodule size, mm
 ≤ 4 13 0.5 (0.3-0.9)
 > 4-≤ 6 16 0.8 (0.5-1.4)
 > 6-≤ 8 19 1.9 (1.2-2.9)
 > 8-≤ 10 16 2.9 (1.8-4.6)
 > 10-≤ 20 127 12.3 (10.5-14.5)
 > 20-≤ 30 74 25.3 (20.6-30.5)
No. of nodules
 Single 118 3.8 (3.2-4.5)
 Multiple 147 3.6 (3.0-4.2)
Nodule location
 Nonupper lobe 111 2.6 (2.1-3.1)
 Upper lobe 154 5.3 (4.5-6.2)
Health system
 A 57 2.5 (1.9-3.2)
 B 208 4.2 (3.7-4.8)
Year of nodule diagnosis
 2005 16 5.2 (3.2-8.2)
 2006 21 4.3 (2.8-6.5)
 2007 21 3.9 (2.5-5.8)
 2008 22 4.0 (2.6-5.9)
 2009 24 4.2 (2.8-6.2)
 2010 21 3.6 (2.5-5.5)
 2011 22 3.1 (2.1-4.6)
 2012 34 4.3 (3.1-5.9)
 2013 31 3.6 (2.6-5.1)
 2014 34 3.7 (2.7-5.1)
 2015 19 2.1 (1.3-3.2)

Figure 1.

Figure 1

A-C, Line graphs showing the relationship between continuous variables and the probability of lung cancer: relationship between age and probability of lung cancer (A), relationship between BMI and probability of lung cancer (B), and relationship between nodule size and probability of lung cancer (C). Red shading represents the 95% CI.

Table 3.

Factors Associated With Lung Cancer

Variable Univariate RR (95% CI) Multivariatea RR (95% CI)
Age, y
 < 44 0.04 (0.01-0.29) ...
 45-54 0.50 (0.31-0.80) ...
 55-64 0.65 (0.46-0.91) ...
 65-74 Reference ...
 75-84 1.12 (0.83-1.51) ...
 85+ 0.93 (0.60-1.45) ...
Sex
 Male Reference Reference
 Female 1.13 (0.89-1.44) 1.51 (1.21-1.88)
BMI, kg/m2
 Underweight (< 18.5) 1.03 (0.43-2.5) ...
 Normal (18.5-24.9) Reference ...
 Overweight (25-29.9) 0.97 (0.72-1.30) ...
 Obesity (≥ 30) 0.78 (0.58-1.06) ...
Prior nonlung malignancy
 No Reference Reference
 Yes 0.94 (0.72-1.21) 0.85 (0.67-1.07)
Smoking status
 Never Reference Reference
 Ever (former or current) 4.22 (3.04-5.85) 3.22 (2.34-4.43)
Nodule size, mm
 ≤ 4 Reference ...
 > 4-≤ 6 1.56 (0.75-3.24) ...
 > 6-≤ 8 3.55 (1.76-7.17) ...
 > 8-≤ 10 5.39 (2.61-11.13) ...
 > 10-≤ 20 23.12 (13.13-40.73) ...
 > 20-≤ 30 47.36 (26.60-84.33) ...
No. of nodules
 Single Reference Reference
 Multiple 0.93 (0.74-1.18) 0.81 (0.65-1.00)
Nodule location
 Nonupper lobe Reference Reference
 Upper lobe 2.07 (1.63-2.63) 1.67 (1.34-2.09)
Nodule laterality
 Left Reference Reference
 Right 0.78 (0.62-0.99) 1.03 (0.83-1.27)
Health system
 A Reference Reference
 B 1.73 (1.29-2.30) 1.19 (0.91-1.56)
 Year of nodule diagnosis 0.96 (0.92-0.99) 1.01 (0.97-1.04)

RR = relative risk.

a

When modelled using splines, the 2-y probability of lung cancer varied significantly by age (P = .003), BMI (P = .020), and nodule size (P < .001) after adjusting for all variables in Table 3.

A multivariate analysis revealed that men, individuals who have ever smoked, and upper lobe nodule location were associated with a higher 2-year probability of lung cancer (Table 3). Likelihood ratio tests revealed a relationship between the probability of lung cancer and age (P = .003), BMI (P = .020), and nodule size (P < .001). Prior nonlung malignancy, nodule number or laterality, health system, and year of nodule diagnosis were not associated with a lung cancer diagnosis.

For all three lung cancer prediction models developed—full model (age, sex, BMI, prior nonlung malignancy, smoking status, and nodule size, location, number, and laterality), significant factors model (age, sex, BMI, smoking status, and nodule size and location), and parsimonious model (nodule size)—the optimism-correction resulted in lower AUC values (Table 4). The AUC values were 0.75 (95% CI, 0.72-0.80) for the full model, 0.75 (95% CI, 0.72-0.80) for the significant factors model, and 0.70 (95% CI, 0.65-0.75) for the parsimonious model (e-Fig 1). Visual inspection of calibration plots revealed that the full model showed the least amount of deviation between predicted and observed probabilities of lung cancer (e-Fig 2). The significant factors and parsimonious models both underestimated and overestimated the probability of lung cancer. The parsimonious model showed the greatest degree of deviation between predicted and observed probability of lung cancer.

Table 4.

Area Under the Curve Estimates for Three Exploratory Models Predicting Lung Cancer

Variable Naïve, Area Under the Curve Optimism-Corrected, Area Under the Curve (95% CI)
Full modela 0.87 0.75 (0.72-0.80)
Significant factors modelb 0.88 0.75 (0.72-0.80)
Parsimonious modelc 0.85 0.70 (0.65-0.75)
a

Includes age, sex, BMI, prior nonlung malignancy, smoking status, and nodule size, laterality, location, and number.

b

Includes age, sex, BMI, smoking status, nodule location, and nodule size.

c

Includes nodule size only.

Discussion

We demonstrated that lung cancer was uncommon (3.7%) among a population-based cohort of individuals with incidentally detected lung nodules. Factors traditionally associated with lung cancer—age, sex, BMI, ever smoker status, nodule size, and nodule location—were associated with lung cancer in this cohort. Novel findings include nonlinear relationships between lung cancer and age and BMI and no association between lung cancer and a history of a prior nonlung malignancy or number of nodules. Two exploratory prediction models showed good discrimination and were well calibrated.

Findings from this study may influence how clinicians care for and how practice guideline committees consider individuals with an incidentally detected lung nodule and a prior history of a nonlung malignancy. In studies of individuals with incidentally detected lung nodules, the frequency of prior malignancy varied from 8% to 37%.11,13,14,30 Management of this segment of the lung nodule population is complex for two reasons: (1) the nodule may be recurrent, metastatic cancer from a no-lung primary and (2) prior studies suggest that prior malignancy is associated with a higher risk of lung cancer. Our study suggests no association between prior nonlung malignancy and a higher or lower risk of lung cancer. However, the absence of a relationship may be confounded by the type of prior nonlung malignancy, stage at presentation, and disease-free interval. Until additional evidence about the relationship between a prior nonlung malignancy and lung cancer becomes available, the complexity of care will remain high for this sizeable segment of the population of individuals with incidentally detected lung nodules. From a practice guideline perspective, one organization does not consider patients with a prior malignancy to be eligible for guideline-recommended nodule evaluation, whereas another organization is less restrictive.6,7 Clinicians and guideline committees could consider referral to a multidisciplinary clinic or a subspecialist for higher-level care.

Findings from this study also may influence how clinicians care for, and practice guideline committees consider, individuals with multiple lung nodules. Patients with multiple incidentally detected lung nodules are common: 57% in our population and 61% in another cohort.30 Some consider multiple nodules to be associated with a higher probability of lung cancer.7 We found no evidence of an association between multiple nodules and a higher or lower probability of lung cancer. Multiple nodules also occur commonly in populations of individuals with screen-detected nodules, and multiple nodules were not associated with a higher or lower chance of lung cancer in screening trials.16, 17, 18 The preponderance of evidence does not show an association between multiple nodules and higher risk of lung cancer. Clinicians can reasonably forgo a more intensive surveillance strategy for individuals with multiple incidentally detected lung nodules. Guideline committees may consider eliminating risk stratification by nodule number and instead provide guidance on how to approach a differential diagnosis that includes synchronous lung cancer, T3 or T4 lung cancer, metastatic lung cancer, benign diagnoses (eg, sarcoidosis), or concurrent benign and malignant diagnoses. Given the high complexity of care for this significant segment of the population of individuals with incidentally detected lung nodules, clinicians and practice guidelines may consider referral to a multidisciplinary clinic or a subspecialist for higher-level care.

Findings from this study also may inform the development of new lung cancer prediction models. Available prediction models have variable accuracy across diverse settings,8 possibly because they were developed in highly selected cohorts. Table 51,2,10,11,13,14,30, 31, 32, 33 summarizes select studies intending to highlight common and heterogenous findings across investigations of individuals with incidentally detected lung nodules. One potential advantage of developing a lung cancer prediction model among a population-based cohort is that it may perform more reliably across diverse settings. As a first step toward testing this hypothesis, we provide preliminary evidence in favor of two models that: (1) use readily available variables in the medical record that can be obtained from population-based samples, (2) leverage knowledge of nonlinear relationships between predictors and outcomes, and (3) are modestly parsimonious. CI inspection of the AUC for two of our models (0.72-0.80) suggests that their performance is slightly inferior to, if not similar to, other commonly used models externally validated across diverse settings and contexts (AUCs ranging from 0.60 to 0.89).8 The smaller nodule size and lower probability of lung cancer in this cohort compared with others may have impacted associations in this sample (and therefore model performance). An important cautionary note is that direct comparisons of model performance across studies is challenging because the populations were different. For example, in some external validation studies of models designed for patients with incidentally detected lung nodules, 100% of the population had been ever smokers or the prevalence of lung cancer ranged between 44% and 75%.8 Future investigations will need to validate our model’s performance externally across a variety of care settings, as well as investigate the reasons why only one-third of clinicians use a prediction model, why physician intuition sometimes performs better than prediction models, and why clinicians deviate from guideline-recommended care despite having reliable estimates of the probability of lung cancer.31,34,35

Table 5.

Select Studies of Individuals With an Incidentally Detected Lung Nodule Demonstrating Both Common and Heterogenous Findings

Variable Farjah et al2 Gould et al1 Vachani et al32 Wiener et al33 Gould et al11 Swensen et al10 Tanner et al31 Verdial et al30 Herder et al13 Deppen et al14
Context and setting Population-based Population-based Population-based, nodule > 8 mm Veteran’s Affairs Veteran’s Affairs Referral to quaternary institution Referral to pulmonary clinic Referral to multidisciplinary nodule clinic Referred for PET imaging Referred to thoracic surgery
No. 7,240 68,998 23,780 300 375 629 377 113 106 492
Age,a y 66 63 65 66 66 61 65 NR 64 ± NR 63 ± 13
Women, % 56 56 53 6 2 51 55 58 36 49
Ever smokers, % 56 56 46 86 94 68 73 69 65 77
BMI,a kg/m2 29 NR 28 NR NR NR NR NR NR 28 ± 6
Prior malignancy, % 31 NR NAb NAb 17 NAb NAb 9c 8 37
Nodule size,a mm 7 NR 16 NR 17 13 13 NR NR 28 ± 19
Nodule spiculation, % NR NR NR 11 NR 15 31 26 35 45
Nodule density, %
 Solid NR NR NAd NR NR NR NR 88 NR NR
 Part solid NR NR NAd NR NR NR NR 11 NR NR
 Ground-glass opacity NR NR NAd 13 NR NR NR 3 NR NR
 Multiple nodules, % 57 NR NR NR NA NA NR 61 NR NAa
 Upper lobe location, % 40 NR 38 36 50 44 47 51 58 59
 Lung cancer diagnosis, % 3.7 5.2 9.9 9.0 54 23 25 29 58 78

NA = not available; NR = not reported.

a

All continuous variables summarized by mean values (most commonly reported summary statistic).

b

Not available, because prior malignancy was an exclusion criterion.

c

Frequency of prior malignancy > 5 y prior to nodule detection.

d

Not available, because the frequency of missing data was > 20%.

This study has limitations. Despite examining patients from two large health systems in urban and rural regions and geographic areas with and without endemic fungi, our findings may have limited generalizability. Specifically, our findings may not be generalizable to non-White or Hispanic patients (underrepresented in our study population). We decided not to include race and ethnicity in our prediction models because we could not disentangle the influence of society and biology (if any) on observed relationships between race, ethnicity, and lung cancer. Although a societal influence on the relationships between race and ethnicity and lung cancer may be used to predict lung cancer, we believed that the harms of doing so likely would outweigh any potential benefit.36 Another limitation of our study is that we were unable to measure several variables of interest, such as pack-years, years quit, and other nodule characteristics because clinicians inconsistently and infrequently document these variables in the medical record.22,37,38 As a result, we were unable to compare this cohort with others with information on these variables, and we were unable to test our model performance directly with that of other models that used these variables. In some cases, we were unable to obtain more granular details, such as the time between prior nonlung malignancy and nodule detection, type of prior malignancy, the actual number of nodules, the size of additional nodules, or nodule density.22 For example, radiologists documented nodule density in only 7.7% of individuals with an incidentally detected lung nodule.22 Consequently, we may have misclassified relationships between patient and nodule characteristics and the diagnosis of lung cancer. Standardized reporting of nodule characteristics may facilitate patient care and future research efforts. Another limitation is the 2 year follow-up period, which is unlikely to be long enough for individuals with nonsolid nodules. A prior study reported that of 22,640 patients with a screen-detected lung nodule, 2,392 had a nonsolid nodule, of which 73 had a lung cancer diagnosis within 10 years of nodule detection.39 These estimates suggest we may be underestimating the probability of lung cancer by 0.3% (8% on a relative scale). Another limitation is that we did not measure the probability of a nonlung malignancy. A prior study reported 2 of 113 individuals with an incidentally detected lung nodule had a nonlung primary malignancy (1.8%)30; however, this cohort excluded individuals with a prior history of malignancy, so the probability of nonlung primary likely is higher than 1.8%. Finally, our prediction models only had two variables—nodule size and smoking status—with strong associations with lung cancer as defined by Pepe et al27 as an OR of > 3. Consequently, peak performance may not exceed clinician intuition.35 Our significant factors model was most parsimonious without performance decrements, but it included BMI, which was the most common missing variable in our data set. If BMI also is missing frequently in other clinical settings and populations, then model use may be limited.

Interpretation

From a population-based perspective, lung cancer is uncommon among individuals with incidentally detected lung nodules. Two novel findings from this study include no evidence of an association with a history of a prior nonlung malignancy or multiple nodules and a higher or lower chance of lung cancer. Both a history of prior nonlung malignancy and multiple nodules occur commonly in populations of individuals with incidentally detected lung nodules. Generalist clinicians and practice guideline committees may consider referral to a multidisciplinary nodule clinic or subspecialist. Prediction models developed using readily available data from a population-based cohort are promising, but require external validation and comparison with existing models across various practice settings and patient populations before use in clinical practice.

Funding/Support

Research reported in this publication was supported by the National Cancer Institute of the National Institutes of Health [Grant R01CA207375].

Financial/Nonfinancial Disclosures

The authors have reported to CHEST the following: M. K. G. reports receiving royalties from UpToDate to coauthor topics on lung cancer diagnosis and staging and research support through his employer from Medial EarlySign to develop machine learning models of lung cancer risk. None declared (F. F., S. E. M., R. T. G., R. S.-B., M. P. B., K. S., A. R., D. S. M. B.).

Acknowledgments

Author contributions: F. F. had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis, including and especially any adverse effects. F. F., S. E. M., R. T. G., M. K. G., R. S.-B., M. P. B., K. S., A. R., and D. S. M. B. contributed substantially to the study design, data analysis and interpretation, and writing of the manuscript.

Role ofsponsors: The content of this article is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Other contributions: The authors thank Caroline Shevrin, MS, for technical editing.

Additional information: The e-Appendix, e-Figures, and e-Tables are available online under “Supplementary Data.”

Supplementary Data

e-Online Data
mmc1.docx (308.8KB, docx)
Audio
Download audio file (33.1MB, mp3)

References

  • 1.Gould M.K., Tang T., Liu I.L., et al. Recent trends in the identification of incidental pulmonary nodules. Am J Respir Crit Care Med. 2015;192(10):1208–1214. doi: 10.1164/rccm.201505-0990OC. [DOI] [PubMed] [Google Scholar]
  • 2.Farjah F., Monsell S.E., Gould M.K., et al. Association of the intensity of diagnostic evaluation with outcomes in incidentally detected lung nodules. JAMA Intern Med. 2021;181(4):480–489. doi: 10.1001/jamainternmed.2020.8250. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Slatore C.G., Press N., Au D.H., Curtis J.R., Wiener R.S., Ganzini L. What the heck is a “nodule”? A qualitative study of veterans with pulmonary nodules. Ann Am Thorac Soc. 2013;10(4):330–335. doi: 10.1513/AnnalsATS.201304-080OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Slatore C.G., Golden S.E., Ganzini L., Wiener R.S., Au D.H. Distress and patient-centered communication among veterans with incidental (not screen-detected) pulmonary nodules. A cohort study. Ann Am Thorac Soc. 2015;12(2):184–192. doi: 10.1513/AnnalsATS.201406-283OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Wiener R.S., Gould M.K., Woloshin S., Schwartz L.M., Clark J.A. What do you mean, a spot? A qualitative analysis of patients’ reactions to discussions with their physicians about pulmonary nodules. Chest. 2013;143(3):672–677. doi: 10.1378/chest.12-1095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Gould M.K., Donington J., Lynch W.R., et al. Evaluation of individuals with pulmonary nodules: when is it lung cancer? Diagnosis and management of lung cancer, 3rd ed: American College of Chest Physicians evidence-based clinical practice guidelines. Chest. 2013;143(5 suppl):e93S–e120S. doi: 10.1378/chest.12-2351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.MacMahon H., Naidich D.P., Goo J.M., et al. Guidelines for management of incidental pulmonary nodules detected on CT images: from the Fleischner Society 2017. Radiology. 2017;284(1):228–243. doi: 10.1148/radiol.2017161659. [DOI] [PubMed] [Google Scholar]
  • 8.Choi H.K., Ghobrial M., Mazzone P.J. Models to estimate the probability of malignancy in patients with pulmonary nodules. Ann Am Thorac Soc. 2018;15(10):1117–1126. doi: 10.1513/AnnalsATS.201803-173CME. [DOI] [PubMed] [Google Scholar]
  • 9.Gurney J.W., Lyddon D.M., McKay J.A. Determining the likelihood of malignancy in solitary pulmonary nodules with Bayesian analysis. Part II. Application. Radiology. 1993;186(2):415–422. doi: 10.1148/radiology.186.2.8421744. [DOI] [PubMed] [Google Scholar]
  • 10.Swensen S.J., Silverstein M.D., Ilstrup D.M., Schleck C.D., Edell E.S. The probability of malignancy in solitary pulmonary nodules. Application to small radiologically indeterminate nodules. Arch Intern Med. 1997;157(8):849–855. [PubMed] [Google Scholar]
  • 11.Gould M.K., Ananth L., Barnett P.G., Veterans Affairs SNAP Cooperative Study Group A clinical model to estimate the pretest probability of lung cancer in patients with solitary pulmonary nodules. Chest. 2007;131(2):383–388. doi: 10.1378/chest.06-1261. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Li Y., Wang J. A mathematical model for predicting malignancy of solitary pulmonary nodules. World J Surg. 2012;36(4):830–835. doi: 10.1007/s00268-012-1449-8. [DOI] [PubMed] [Google Scholar]
  • 13.Herder G.J., van Tinteren H., Golding R.P., et al. Clinical prediction model to characterize pulmonary nodules: validation and added value of 18F-fluorodeoxyglucose positron emission tomography. Chest. 2005;128(4):2490–2496. doi: 10.1378/chest.128.4.2490. [DOI] [PubMed] [Google Scholar]
  • 14.Deppen S.A., Blume J.D., Aldrich M.C., et al. Predicting lung cancer prior to surgical resection in patients with lung nodules. J Thorac Oncol. 2014;9(10):1477–1484. doi: 10.1097/JTO.0000000000000287. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Soardi G.A., Perandini S., Motton M., Montemezzi S. Assessing probability of malignancy in solid solitary pulmonary nodules with a new Bayesian calculator: improving diagnostic accuracy by means of expanded and updated features. Eur Radiol. 2015;25(1):155–162. doi: 10.1007/s00330-014-3396-2. [DOI] [PubMed] [Google Scholar]
  • 16.Heuvelmans M.A., Walter J.E., Peters R.B., et al. Relationship between nodule count and lung cancer probability in baseline CT lung cancer screening: the NELSON study. Lung Cancer. 2017;113:45–50. doi: 10.1016/j.lungcan.2017.08.023. [DOI] [PubMed] [Google Scholar]
  • 17.Walter J.E., Heuvelmans M.A., de Bock G.H., et al. Relationship between the number of new nodules and lung cancer probability in incidence screening rounds of CT lung cancer screening: the NELSON study. Lung Cancer. 2018;125:103–108. doi: 10.1016/j.lungcan.2018.05.007. [DOI] [PubMed] [Google Scholar]
  • 18.McWilliams A., Tammemagi M.C., Mayo J.R., et al. Probability of cancer in pulmonary nodules detected on first screening CT. N Engl J Med. 2013;369(10):910–919. doi: 10.1056/NEJMoa1214726. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Ross T.R., Ng D., Brown J.S., et al. The HMO research network virtual data warehouse: a public data model to support collaboration. EGEMS (Wash DC) 2014;2(1):1049. doi: 10.13063/2327-9214.1049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.von Elm E., Altman D.G., Egger M., et al. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. PLoS Med. 2007;4(10):e296. doi: 10.1371/journal.pmed.0040296. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Danforth K.N., Early M.I., Ngan S., Kosco A.E., Zheng C., Gould M.K. Automated identification of patients with pulmonary nodules in an integrated health system using administrative health plan data, radiology reports, and natural language processing. J Thorac Oncol. 2012;7(8):1257–1262. doi: 10.1097/JTO.0b013e31825bd9f5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Farjah F., Halgrim S., Buist D.S., et al. An automated method for identifying individuals with a lung nodule can be feasibly implemented across health systems. EGEMS (Wash DC) 2016;4(1):1254. doi: 10.13063/2327-9214.1254. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Smith L., Brinton L.A., Spitz M.R., et al. Body mass index and risk of lung cancer among never, former, and current smokers. J Natl Cancer Inst. 2012;104(10):778–789. doi: 10.1093/jnci/djs179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Sanikini H., Yuan J.M., Butler L.M., et al. Body mass index and lung cancer risk: a pooled analysis based on nested case-control studies from four cohort studies. BMC Cancer. 2018;18(1):220. doi: 10.1186/s12885-018-4124-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Fine J.P., Gray R.J. A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc. 1999;94(446):496–509. [Google Scholar]
  • 26.Wilson E.B. Probable inference, the law of succession, and statistical inference. J Am Stat Assoc. 1927;22(158):209–212. [Google Scholar]
  • 27.Pepe M.S., Janes H., Longton G., Leisenring W., Newcomb P. Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. Am J Epidemiol. 2004;159(9):882–890. doi: 10.1093/aje/kwh101. [DOI] [PubMed] [Google Scholar]
  • 28.Harrell F.E., Jr., Lee K.L., Mark D.B. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996;15(4):361–387. doi: 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4. [DOI] [PubMed] [Google Scholar]
  • 29.Iba K., Shinozaki T., Maruo K., Noma H. Re-evaluation of the comparative effectiveness of bootstrap-based optimism correction methods in the development of multivariable clinical prediction models. BMC Med Res Methodol. 2021;21(1):9. doi: 10.1186/s12874-020-01201-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Verdial F.C., Madtes D.K., Cheng G.S., et al. Multidisciplinary team-based management of incidentally detected lung nodules. Chest. 2020;157(4):985–993. doi: 10.1016/j.chest.2019.11.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Tanner N.T., Aggarwal J., Gould M.K., et al. Management of pulmonary nodules by community pulmonologists: a multicenter observational study. Chest. 2015;148(6):1405–1414. doi: 10.1378/chest.15-0630. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Vachani A., Zheng C., Amy Liu I.L., Huang B.Z., Osuji T.A., Gould M.K. The probability of lung cancer in patients with incidentally detected pulmonary nodules: clinical characteristics and accuracy of prediction models. Chest. 2022;161(2):562–571. doi: 10.1016/j.chest.2021.07.2168. [DOI] [PubMed] [Google Scholar]
  • 33.Wiener R.S., Gould M.K., Slatore C.G., Fincke B.G., Schwartz L.M., Woloshin S. Resource use and guideline concordance in evaluation of pulmonary nodules for cancer: too much and too little care. JAMA Intern Med. 2014;174(6):871–880. doi: 10.1001/jamainternmed.2014.561. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Tanner N.T., Brasher P.B., Jett J., Silvestri G.A. Effect of a rule-in biomarker test on pulmonary nodule management: a survey of pulmonologists and thoracic surgeons. Clin Lung Cancer. 2020;21(2):e89–e98. doi: 10.1016/j.cllc.2019.05.004. [DOI] [PubMed] [Google Scholar]
  • 35.Tanner N.T., Porter A., Gould M.K., Li X.J., Vachani A., Silvestri G.A. Physician assessment of pretest probability of malignancy and adherence with guidelines for pulmonary nodule evaluation. Chest. 2017;152(2):263–270. doi: 10.1016/j.chest.2017.01.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Vyas D.A., Eisenstein L.G., Jones D.S. Hidden in plain sight—reconsidering the use of race correction in clinical algorithms. N Engl J Med. 2020;383(9):874–882. doi: 10.1056/NEJMms2004740. [DOI] [PubMed] [Google Scholar]
  • 37.Gould M.K., Sakoda L.C., Ritzwoller D.P., et al. Monitoring lung cancer screening use and outcomes at four cancer research network sites. Ann Am Thorac Soc. 2017;14(12):1827–1835. doi: 10.1513/AnnalsATS.201703-237OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Peterson E., Harris K., Farjah F., Akinsoto N., Marcotte L.M. Improving smoking history documentation in the electronic health record for lung cancer risk assessment and screening in primary care: a case study. Healthc (Amst) 2021;9(4) doi: 10.1016/j.hjdsi.2021.100578. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Yankelevitz D.F., Yip R., Smith J.P., et al. CT screening for lung cancer: nonsolid nodules in baseline and annual repeat rounds. Radiology. 2015;277(2):555–564. doi: 10.1148/radiol.2015142554. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

e-Online Data
mmc1.docx (308.8KB, docx)
Audio
Download audio file (33.1MB, mp3)

Articles from Chest are provided here courtesy of American College of Chest Physicians

RESOURCES