Abstract
Background
At initial lung cancer diagnosis, intrapulmonary metastasis (IPM) usually reflects more advanced intrathoracic disease than single primary lung cancer (SPLC). However, the clinical and pathological characteristics associated with presenting as IPM rather than SPLC, and with more extensive IPM patterns, are not well described. This study aimed to characterise how sociodemographic and tumor features at diagnosis are associated with IPM compared with SPLC and with different intrapulmonary metastatic patterns in a large population-based registry.
Methods
We conducted a cross-sectional analysis of patients with non-small cell lung cancer in the Surveillance, Epidemiology, and End Results database from 2000 to 2019. IPM and SPLC were defined using the “Separate Tumor Nodules Ipsilateral Lung” recode. Bayesian network modeling and structural equation modeling were used to describe conditional association structures among sociodemographic variables, tumor characteristics, and lung cancer type. Simulated interventions in the Bayesian network yielded model-based risk ratios (RRs) with 95% confidence intervals (CIs) for IPM versus SPLC. Logistic regression was used in an exploratory subgroup analysis of IPM patterns comparing disease confined to the same lobe, disease in different lobes, and disease in both the same and different lobes.
Results
Among 45,194 patients, 9,302 had IPM and 35,892 had SPLC. In the Bayesian network, tumor grade and laterality showed the strongest direct associations with lung cancer type, and the model discriminated IPM from SPLC with an area under the curve of 0.919. Sociodemographic variables showed weaker and less consistent associations with lung cancer type after adjustment for tumor characteristics. Simulated interventions suggested progressively higher model-based risk of IPM with poorer differentiation (RR of well-differentiated to poorly differentiated grade: 1.664, 95% CI: 1.571–1.772) and with right-sided disease (RR of right-sided to left-sided disease: 1.136, 95% CI: 1.093–1.178). In subgroup analyses, higher grade and lower and middle lobe location were associated with IPM patterns involving multiple lobes.
Conclusions
In this large registry-based study, intrapulmonary metastatic disease at first lung cancer diagnosis was more strongly associated with tumor differentiation, laterality, and anatomical distribution than with measured sociodemographic factors. These observational associations may help characterise patients who present with more extensive intrapulmonary disease.
Keywords: Lung cancer, intrapulmonary metastasis (IPM), single primary lung cancer (SPLC), Bayesian network, structural equation modeling (SEM)
Highlight box.
Key findings
• In a large Surveillance, Epidemiology, and End Results (SEER) cohort, Bayesian network and structural equation modeling summarised how sociodemographic and tumor characteristics are associated with intrapulmonary metastasis (IPM) versus single primary lung cancer (SPLC) at first diagnosis, with tumor grade, laterality, and lobe location showing the strongest links to IPM.
• The Bayesian network discriminated IPM from SPLC with high accuracy, and model-based contrasts suggest higher probability of IPM with poorer differentiation and with right-sided or bilateral disease.
• In subgroup analyses of IPM patterns, involvement of different lobes or both the same and different lobes was more strongly associated with higher-grade and lower or middle lobe tumors than with sociodemographic characteristics.
What is known and what is new?
• IPM reflects more advanced disease and poorer prognosis than SPLC. Prior work has focused on treatment, survival, and diagnostic criteria distinguishing intrapulmonary spread from separate primary tumors.
• This study provides a cross-sectional description of how grade, laterality, lobe location and other clinicopathological features recorded in SEER are associated with presenting as IPM versus SPLC and with different intrapulmonary metastatic patterns among patients with IPM.
What is the implication, and what should change now?
• Assessment and tumor differentiation, laterality, and lobe-specific involvement at initial diagnosis may help clinicians recognise patients more likely to have intrapulmonary metastatic disease.
• Multidisciplinary teams can use these association patterns to support decision making in complex cases, and registries consistently record lobe patterns and integrate imaging or molecular data to refine and validate these associations.
Introduction
Lung cancer remains one of the leading causes of cancer-related mortality worldwide (1). Its development reflects the combined influence of genetic, environmental, and clinical factors (2,3). With the widespread use of low-dose chest computed tomography (CT) screening, an increasing number of patients are being diagnosed with multiple ground-glass nodules (GGNs), including GGN that may represent very early or subclinical disease (4-6). In clinical practice, these nodules may correspond to a single primary lung cancer (SPLC) or to a more advanced presentation in which cancer cells from an index tumor have already spread within the lung to form additional intrapulmonary metastasis (IPM) (7,8). Distinguishing these diagnostic patterns at the time of first lung cancer diagnosis is crucial because IPM usually indicates higher tumor burden, more advanced stage, and poorer clinical outcomes compared with SPLC.
Most population-based and clinical studies have examined SPLC in isolation and have focused on incidence, treatment, and survival in non-small cell lung cancer (9,10). In contrast, less attention has been paid to the characteristics of patients who already have IPM spread at their initial diagnosis. Existing work has mainly evaluated survival after resection, patterns of recurrence, or the performance of pathologic and imaging criteria used to separate intrapulmonary spread from separate primary tumors (11-13). As a result, there is limited evidence on how routine sociodemographic and tumor characteristics at first diagnosis differ between patients recorded as SPLC and those recorded as IPM in large cancer registries. Furthermore, among patients with IPM, even less is known about how patterns of lobe involvement, such as metastases confined to one lobe, metastases in different lobes, or metastases involving both the same and different lobes relate to more aggressive intrapulmonary disease. The Surveillance, Epidemiology, and End Results (SEER) program provides large-scale data on sociodemographic variables and tumor characteristics (14,15). A detailed description of how sociodemographic features and tumor characteristics are jointly associated with SPLC versus IPM at initial presentation can improve clinical understanding of which patients tend to present with intrapulmonary metastatic disease rather than a single primary tumor. In addition, examining how these characteristics relate to different intrapulmonary metastatic patterns within the IPM group may provide further insight into the clinical profile of more extensive intrapulmonary spread.
Bayesian networks and structural equation modeling (SEM) are complementary multivariable approaches that can summarize complex association structures in such data. Bayesian networks represent conditional dependence relationships among variables in a probabilistic graphical form and can be used to explore how changes in one characteristic are associated with changes in the predicted distribution of others (16-19). SEM provides a flexible framework to represent direct and indirect associations among observed variables and an outcome and has been widely used in cancer epidemiology to describe patterns in covariance structures (20-22). In observational cross-sectional data, these methods do not establish causality but can offer an integrated view of how sociodemographic and tumor characteristics co-vary with diagnostic patterns.
In this study, we conducted a cross-sectional analysis of SEER data from 2000 to 2019 to characterize how sociodemographic and tumor features at diagnosis are associated with SPLC and IPM. Our primary objective was to describe the associations between these features and the diagnostic pattern recorded as SPLC or IPM at the time of first lung cancer diagnosis. As a secondary objective, among patients with IPM and known lobe status, we explored how these characteristics are related to different intrapulmonary metastatic patterns defined by lobe involvement, in order to supplement clinical knowledge about the profiles of more aggressive intrapulmonary metastatic disease. We present this article in accordance with the STROBE reporting checklist (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2025-1085/rc).
Methods
Study cohort, data processing and baseline analysis
In this retrospective cohort study, we retrieved clinicopathological data, including age, sex, race, pathology, location, grade, laterality, tumor node metastasis (TNM) stage, T stage, N stage, M stage, median household income, position on the rural-urban continuum, and marital status, from patients with confirmed lung cancer diagnoses in the 17 SEER registries (November 2021 submission) using SEER Stat 8.4.4 software. The site record International Classification of Diseases for Oncology, third edition (ICD-O-3)/World Health Organization (WHO) 2008 was set to “Lung and Bronchus”. The inclusion criteria were as follows: (I) a diagnosis between 2000 and 2019; (II) age over 18 years; (III) histologically confirmed non-small cell adenocarcinoma, non-small cell neuroendocrine carcinoma (NEC), non-small cell neuroendocrine tumors (NETs), or NEC; (IV) the record of the patient’s first visit, with sequence numbers marked as “1st of 2 or more primaries” or “One primary only”; (V) laterality recorded as “Right-origin of primary”, “Bilateral, single primary”, or “Left-origin of primary”. The exclusion criteria included the following: (I) patients with prior malignancies before the diagnosis of primary lung cancer; (II) incomplete information. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.
Definition of IPM and SPLC
The “Separate Tumor Nodules Ipsilateral Lung” recode for SPLC was set as “None; No intrapulmonary mets; Foci in situ/minimally invasive adenocarcinoma”. For IPM, the recode was set as “Separate nodules of same hist type in ipsilateral lung, different lobe”, “Separate nodules of same hist type in ipsilateral lung, same lobe”, “Separate nodules of same hist type in ipsilateral lung, same AND different lobes”, and “Separate tumor nodules, ipsilateral lung, unknown if same or different lobe”.
Bayesian network analysis
We employed Bayesian network to describe the joint distribution of clinical variables and lung cancer type and to explore conditional dependence structures. The network structure was learned using the hill climb search algorithm (pgmpy.estimators), optimized with the Bayesian information criterion (BIC). To respect a plausible temporal and clinical ordering, we constrained the structure learning. Edges from birth variables to sociodemographic and tumor variables were allowed. Edges from sociodemographic variables to sociodemographic and tumor variables were allowed. Edges from tumor variables to other tumor variables were allowed. Edges from later tiers back to earlier tiers were prohibited, and edges from composite TNM stage to its components T, N, and M stages were not allowed. Directed edges were therefore interpreted as conditional dependence relationships within the fitted probabilistic model rather than as evidence of causal effects. The performance of the Bayesian network in predicting outcomes was assessed using the receiver operating characteristic (ROC) curve and the area under the curve (AUC), calculated with sklearn and matplotlib.
SEM analysis
We then fitted an SEM as a path analysis representation of the main conditional associations suggested by the Bayesian network. Residual covariances were allowed between TNM, T, N, and M stages. The model was fitted with maximum likelihood in the semopy package. Global fit indices, including goodness of fit index (GFI), Chi-squared statistics, comparative fit index (CFI), Tucker and Lewis index (TLI), normed fit index (NFI), adjusted GFI (AGFI), root mean square error of approximation (RMSEA), and standardised root mean square residual (SRMR), were used to describe the concordance between the model-implied and observed covariance structures. Given the cross-sectional design and imperfect fit, the SEM was interpreted as a descriptive association model rather than a validated causal structure.
Simulated intervention
Conditional probabilities were estimated using maximum likelihood. Inference was performed with variable elimination. We used the fitted network to evaluate how the model-implied probability of IPM versus SPLC changed when fixing specific variables at different levels while leaving the empirical distribution of other variables unchanged. These contrasts represent differences within the fitted joint distribution and are interpreted as statistical associations. We contrasted each level with a clinically relevant reference level and derived model-based risk ratios (RRs) and risk differences for IPM versus SPLC. For each of one thousand bootstrap samples, we refitted the Bayesian network under the same structural constraints and recomputed the RRs and risk differences. Two-sided P values were derived from the bootstrap distributions and confidence intervals (CIs) were obtained from bootstrap percentiles. Results were displayed as forest plots.
Subgroup analysis of IPM patterns
As an exploratory subgroup analysis, we restricted the cohort to patients with IPM and examined factors associated with different patterns of intrapulmonary disease. Two separate binary logistic regression models were fitted comparing IPM in different lobes versus IPM in the same lobe and IPM in same and different lobes versus IPM in the same lobe. The same set of variables as in the main analyses was used. Age entered as a continuous variable. Grade, income, rural and urban continuum, and TNM, T, N, and M stages were treated as ordered variables using the clinical order. Race, sex, pathology, tumor location, laterality, and marital status were treated as nominal variables and were coded with indicator variables, with White race, male, non-small cell adenocarcinoma, upper lobe location, left-sided laterality, and married status as reference categories. Odds ratios (ORs) with 95% CIs and P values were reported for each comparison. These models were used to describe how sociodemographic and tumor characteristics were associated with more extensive intrapulmonary metastatic patterns relative to the reference IPM in the same lobe group.
Statistical analyses
Continuous variables were summarized as means with standard deviations, and categorical variables as counts and percentages. Group comparisons among SPLC and IPM used Chi-squared tests and analysis of variance. Analyses were performed in Python 3.12 and R 4.3.2, with a two-tailed P value less than 0.05 considered significant.
Results
Baseline analysis
A total of 45,194 lung cancer patients were included in this study, consisting of 9,302 patients with IPM and 35,892 patients with SPLC. The clinicopathological characteristics of the patients are summarized in Table 1. The mean ages for the two groups were 67.24±11.02 and 66.93±10.79 years, respectively (P=0.01). Significant differences among the lung cancer types were found for variables, such as sex, race, pathology, location, grade, laterality, TNM stage, T stage, N stage, M stage, marital status, and median household income. No significant differences were observed in position on the rural-urban continuum.
Table 1. Clinicopathological characteristics of lung cancer patients diagnosed with IPM and SPLC.
| Characteristics | Cohort (n=45,194) | SPLC (n=35,892) | IPM (n=9,302) | P value |
|---|---|---|---|---|
| Age, years | 66.99±10.84 | 66.93±10.79 | 67.24±11.02 | 0.01 |
| Female | 24,430 (54.06) | 19,512 (54.36) | 4,918 (52.87) | 0.01 |
| Race | <0.001 | |||
| American Indian or Alaska Native | 160 (0.35) | 119 (0.33) | 41 (0.44) | |
| Asian or Pacific Islander | 4,017 (8.89) | 3,080 (8.58) | 937 (10.07) | |
| Black | 4,572 (10.12) | 3,564 (9.93) | 1,008 (10.84) | |
| White | 36,445 (80.64) | 29,129 (81.16) | 7,316 (78.65) | |
| Pathology | <0.001 | |||
| Neuroendocrine carcinoma | 3,698 (8.18) | 2,719 (7.58) | 979 (10.52) | |
| Non-small cell NEC | 1,629 (3.6) | 1,292 (3.6) | 337 (3.62) | |
| Non-small cell NET | 1,494 (3.31) | 1,333 (3.71) | 161 (1.73) | |
| Non-small cell adenocarcinoma | 38,373 (84.91) | 30,548 (85.11) | 7,825 (84.12) | |
| Location | <0.001 | |||
| Lower lobe | 13,937 (30.84) | 11,077 (30.86) | 2,860 (30.75) | |
| Main bronchus | 1,237 (2.74) | 872 (2.43) | 365 (3.92) | |
| Middle lobe | 2,529 (5.6) | 2,003 (5.58) | 526 (5.65) | |
| Overlapping lesion of lung | 571 (1.26) | 387 (1.08) | 184 (1.98) | |
| Upper lobe | 26,920 (59.57) | 21,553 (60.05) | 5,367 (57.7) | |
| Grade | <0.001 | |||
| Undifferentiated | 2,872 (6.35) | 2,151 (5.99) | 721 (7.75) | |
| Poorly differentiated | 18,594 (41.14) | 14,050 (39.15) | 4,544 (48.85) | |
| Moderately differentiated | 15,831 (35.03) | 12,954 (36.09) | 2,877 (30.93) | |
| Well-differentiated | 7,897 (17.47) | 6,737 (18.77) | 1,160 (12.47) | |
| Laterality | <0.001 | |||
| Bilateral | 49 (0.11) | 0 | 49 (0.53) | |
| Left | 18,259 (40.4) | 14,789 (41.2) | 3,470 (37.3) | |
| Right | 26,886 (59.49) | 21,103 (58.8) | 5,783 (62.17) | |
| TNM stage | <0.001 | |||
| IA | 11,078 (24.51) | 11,078 (30.86) | 0 | |
| IB | 5,889 (13.03) | 5,889 (16.41) | 0 | |
| IIA | 2,917 (6.45) | 2,917 (8.13) | 0 | |
| IIB | 2,607 (5.77) | 1,444 (4.02) | 1,163 (12.5) | |
| IIIA | 6,149 (13.61) | 4,668 (13.01) | 1,481 (15.92) | |
| IIIB | 2,004 (4.43) | 1,329 (3.7) | 675 (7.26) | |
| IV | 14,550 (32.19) | 8,567 (23.87) | 5,983 (64.32) | |
| T stage | <0.001 | |||
| T0 | 2 (0.0) | 2 (0.01) | 0 | |
| T1a | 8,790 (19.45) | 8,790 (24.49) | 0 | |
| T1b | 5,912 (13.08) | 5,912 (16.47) | 0 | |
| T2a | 11,251 (24.89) | 11,251 (31.35) | 0 | |
| T2b | 3,073 (6.8) | 3,073 (8.56) | 0 | |
| T3 | 8,318 (18.41) | 3,962 (11.04) | 4,356 (46.83) | |
| T4 | 7,848 (17.37) | 2,902 (8.09) | 4,946 (53.17) | |
| N stage | <0.001 | |||
| N0 | 24,506 (54.22) | 21,540 (60.01) | 2,966 (31.89) | |
| N1 | 4,423 (9.79) | 3,547 (9.88) | 876 (9.42) | |
| N2 | 12,321 (27.26) | 8,454 (23.55) | 3,867 (41.57) | |
| N3 | 3,944 (8.73) | 2,351 (6.55) | 1,593 (17.13) | |
| M stage | <0.001 | |||
| M0 | 30,644 (67.81) | 27,325 (76.13) | 3,319 (35.68) | |
| M1a | 3,624 (8.02) | 1,713 (4.77) | 1,911 (20.54) | |
| M1b | 10,926 (24.18) | 6,854 (19.1) | 4,072 (43.78) | |
| Marital status | 0.01 | |||
| Divorced | 5,747 (12.72) | 4,560 (12.7) | 1,187 (12.76) | |
| Married | 25,325 (56.04) | 20,237 (56.38) | 5,088 (54.7) | |
| Separated | 527 (1.17) | 415 (1.16) | 112 (1.2) | |
| Single | 6,417 (14.2) | 5,001 (13.93) | 1,416 (15.22) | |
| Unmarried or domestic partner | 140 (0.31) | 117 (0.33) | 23 (0.25) | |
| Widowed | 7,038 (15.57) | 5,562 (15.5) | 1,476 (15.87) | |
| Median household income, $ | <0.001 | |||
| 35,000–39,999 | 1,726 (3.82) | 1,367 (3.81) | 359 (3.86) | |
| 40,000–44,999 | 2,643 (5.85) | 2,101 (5.85) | 542 (5.83) | |
| 45,000–49,999 | 3,012 (6.66) | 2,379 (6.63) | 633 (6.8) | |
| 50,000–54,999 | 4,477 (9.91) | 3,592 (10.01) | 885 (9.51) | |
| 55,000–59,999 | 3,702 (8.19) | 2,963 (8.26) | 739 (7.94) | |
| 60,000–64,999 | 9,122 (20.18) | 7,066 (19.69) | 2,056 (22.1) | |
| 65,000–69,999 | 4,756 (10.52) | 3,803 (10.6) | 953 (10.25) | |
| 70,000–74,999 | 2,821 (6.24) | 2,275 (6.34) | 546 (5.87) | |
| <35,000 | 1,058 (2.34) | 868 (2.42) | 190 (2.04) | |
| ≥75,000 | 11,877 (26.28) | 9,478 (26.41) | 2,399 (25.79) | |
| Rural-urban continuum | 0.17 | |||
| Nonmetropolitan counties not adjacent to a metropolitan area | 2,663 (5.89) | 2,120 (5.91) | 543 (5.84) | |
| Nonmetropolitan counties adjacent to a metropolitan area | 3,502 (7.75) | 2,821 (7.86) | 681 (7.32) | |
| Counties in metropolitan areas of less than 250 thousand people | 3,747 (8.29) | 2,969 (8.27) | 778 (8.36) | |
| Counties in metropolitan areas of 250,000 to 1 million people | 9,713 (21.49) | 7,764 (21.63) | 1,949 (20.95) | |
| Counties in metropolitan areas of 1 million people or more | 25,569 (56.58) | 20,218 (56.33) | 5,351 (57.53) |
Data are presented as n (%) or mean ± SD. IPM, intrapulmonary metastasis; NEC, neuroendocrine carcinoma; NET, neuroendocrine tumors; SD, standard deviation; SPLC, single primary lung cancer; TNM, tumor node metastasis.
Bayesian network analysis
Bayesian network captured the main conditional dependencies among sociodemographic and tumor-related variables and lung cancer type (Figure 1A). Sex and race were connected to marital status, age, median household income, and the rural-urban continuum, while TNM, T, N, and M stages, grade, laterality, tumor location, pathology, and lung cancer type formed a clinically plausible cluster of tumor characteristics. In this structure, grade and laterality were directly linked to lung cancer type. When the Bayesian network was used as a probabilistic classifier, the AUC for distinguishing IPM from SPLC was 0.919 (Figure 1B), indicating good discrimination based on the joint pattern of clinical and pathological variables. This performance reflects the network’s ability to summarise multivariable associations.
Figure 1.
Bayesian network model for lung cancer type. (A) Bayesian network showing conditional dependence relationships among sociodemographic variables, tumor characteristics, and lung cancer type. (B) ROC curve for the Bayesian network used as a probabilistic classifier of IPM versus SPLC, with corresponding AUC. AUC, area under the curve; BN, Bayesian network; IPM, intrapulmonary metastasis; M, metastasis; N, node; ROC, receiver operating characteristic; SPLC, single primary lung cancer; T, tumor.
SEM
To represent the main associations suggested by the Bayesian network in a conventional path analysis framework, we fitted an SEM with only observed variables (Figure 2A). The model included regressions of T stage on tumor location and lung cancer type, M stage on T stage and lung cancer type, N stage on M stage and T stage, grade on TNM stage, pathology on grade and location, location on laterality, lung cancer type on grade and laterality, marital status on race and sex, median household income on race and the rural-urban continuum, the rural-urban continuum on race, and age on race and marital status.
Figure 2.
SEM of associations among clinicopathological variables and lung cancer type. (A) Path diagram of the SEM including observed sociodemographic variables, tumor characteristics, and lung cancer type. (B) Histogram of standardised path coefficients. (C) Global fit indices for the SEM. ns, P>0.05; *, P<0.05; **, P<0.01; ***, P<0.001. AGFI, adjusted GFI; CFI, comparative fit index; GFI, goodness of fit index; NFI, normed fit index; ns, not significant; RMSEA, root mean square error of approximation; SEM, structural equation modeling; SRMR, standardised root mean square residual; TLI, Tucker and Lewis index; TNM, tumor node metastasis.
Most specified paths were statistically significant (Figure 2B, Table S1). Higher TNM stage was associated with poorer histological grade (estimate −0.1428, P<0.001), and poorer grade in turn was associated with higher probability of IPM relative to SPLC (lung cancer type on grade: estimate −0.0452, P<0.001, with grade coded from undifferentiated to well-differentiated). Lung cancer type was positively associated with T stage (estimate 1.6194, P<0.001) and negatively associated with M stage (estimate −0.5600, P=0.02), consistent with more advanced local disease among patients classified as IPM. Laterality showed a small but significant association with lung cancer type (estimate 0.0202, P<0.001), reflecting the strong concentration of bilateral disease within the IPM group. As expected, race was strongly related to the rural-urban continuum and median household income, and sex was associated with marital status (all P<0.001).
Global fit indices indicated only moderate concordance between the model-implied and observed covariance structures (Figure 2C). The CFI, NFI and GFI were all approximately 0.91; the TLI and AGFI were approximately 0.88; the RMSEA was 0.066; and the SRMR was 0.071. These values support treating the SEM as a descriptive association model that captures the main dependency patterns highlighted by the Bayesian network.
Simulated intervention
Using the fitted Bayesian network, we performed model-based simulated interventions to examine how the predicted probability of IPM versus SPLC changed when fixing grade or laterality at different levels while keeping the empirical distribution of other variables unchanged. For histological grade, compared with well-differentiated tumors, the model predicted a higher probability of IPM for moderately and poorly differentiated and undifferentiated tumors, with corresponding lower probabilities for SPLC (Figure 3A). For example, moving from well-differentiated to moderately differentiated grade increased the model-based risk of IPM by roughly 20–30% (RR 1.235, 95% CI: 1.160–1.313), whereas moving from well-differentiated to poorly differentiated grade increased the risk by roughly 60% (RR 1.664, 95% CI: 1.571–1.772). In contrast, RRs comparing undifferentiated with poorly differentiated tumors were not statistically significant, suggesting a plateau in the association at the highest levels of poor differentiation.
Figure 3.
Simulated intervention contrasts from the Bayesian network for IPM versus SPLC. (A) Forest plots of model-based RRs for IPM versus SPLC according to histological grade. (B) Forest plot of model-based risk RRs for IPM versus SPLC according to laterality. RRs and 95% CIs are derived from one thousand bootstrap samples of the fitted Bayesian network. CI, confidence interval; IPM, intrapulmonary metastasis; RR, risk ratios; SPLC, single primary lung cancer.
For laterality, bilateral lesions were associated with markedly higher model-based risk of IPM (Figure 3B). Fixing laterality to bilateral versus left produced an estimated RR for IPM of around five (RR 5.215, 95% CI: 4.987–5.411), and bilateral versus right produced a RR of around four and a half (RR 4.591, 95% CI: 4.419–4.736). Right-sided tumors also showed a modestly higher predicted probability of IPM (RR 1.136, 95% CI: 1.093–1.178) and a lower probability of SPLC compared with left-sided disease (RR 0.968, 95% CI: 0.960–0.978). These simulated intervention contrasts summarise how the fitted joint distribution links grade and laterality to the likelihood of IPM.
Subgroup analysis of IPM patterns
In an exploratory subgroup analysis restricted to patients with IPM, we further examined factors associated with different IPM patterns. Higher histological grade, reflecting poorer differentiation remained strongly associated with more extensive intrapulmonary spread (IPM in a different lobe versus IPM in the same lobe: OR per grade level 1.018, 95% CI: 1.005–1.030; IPM in the same and different lobes versus IPM in the same lobe: OR per grade level; 1.777, 95% CI: 1.423–2.219) (Figure 4A,4B). In particular, lesions in the lower lobe (IPM in a different lobe versus IPM in the same lobe: OR versus upper lobe 2.025, 95% CI: 1.488–2.755; IPM in the same and different lobes versus IPM in the same lobe: OR 1.832, 95% CI: 1.322–2.538) and in the middle lobe (IPM in a different lobe versus the same lobe: OR 6.671, 95% CI: 2.668–16.677; IPM in the same and different lobes versus the same lobe: OR 3.478, 95% CI: 1.343–9.005) also showed higher odds of involving multiple lobes compared with upper lobe disease. In contrast, most sociodemographic variables, including race, sex, marital status, income, and rural-urban continuum, were not consistently associated with intrapulmonary metastatic pattern after adjustment for tumor characteristics. Overall, these subgroup findings suggest that, within this cohort, variation in IPM pattern is more strongly associated with tumor grade and anatomical distribution than with sociodemographic characteristics.
Figure 4.
Logistic regression analysis of intrapulmonary metastatic patterns. (A) Forest plot of adjusted ORs comparing IPM in a different lobe versus IPM in the same lobe. (B) Forest plot of adjusted ORs comparing IPM in the same and different lobes versus IPM in the same lobe. Grade, income, rural-urban continuum, and TNM, T, N, and M stages are treated as ordered variables. Race, sex, pathology, tumor location, laterality, and marital status are treated as nominal variables and coded with indicator variables, with White race, male, non-small cell adenocarcinoma, upper lobe location, left-sided laterality, and married status as reference categories. Thus, ORs for tumor location levels such as lower lobe or middle lobe represent the odds of having IPM in a different lobe versus the same lobe, or IPM in the same and different lobes versus the same lobe, relative to upper lobe tumors. CI, confidence interval; IPM, intrapulmonary metastasis; NEC, neuroendocrine carcinoma; NET, neuroendocrine tumors; OR, odds ratio; TNM, tumor node metastasis.
Discussion
In this study, we applied Bayesian network modelling and SEM to describe how clinicopathological variables relate to the probability of IPM compared with SPLC. The analyses consistently indicated that tumor grade, laterality, and anatomical distribution showed the strongest associations with intrapulmonary metastatic spread, whereas sociodemographic characteristics had a much weaker and less consistent role after adjustment for tumor features.
Higher histological grade was associated with a higher probability of IPM rather than SPLC in both the Bayesian network and the SEM. The simulated intervention analysis further illustrated that, within the fitted joint distribution, moving from well-differentiated to moderately or poorly differentiated grade was accompanied by substantial increases in the model-based risk of IPM, while SPLC became less likely. Taken together, these results are in line with the widely accepted view that less differentiated tumors tend to present with more extensive disease (23), although our findings should be interpreted as statistical associations conditioned on the available variables rather than as evidence of temporal progression.
Laterality and tumor location also showed important associations with intrapulmonary metastatic patterns. In the Bayesian network and simulated intervention analyses, right-sided tumors were linked to a higher predicted probability of IPM. In the subgroup analysis restricted to patients with IPM, lesions in the lower and middle lobes had higher odds of involving different lobes or both the same and different lobes compared with upper lobe lesions. Previous studies have either not examined laterality in detail or have reported no significant association between laterality and an IPM (24) or prognosis (25-27). Interestingly, one retrospective-prospective study had found that among patients with more than five GGNs, most of their nodules occur unilaterally, mainly in the right lung and upper lobe (28). This divergence could be due to the large, diverse sample size used in this study, which may have captured subtle associations that previous research could not detect. Differences in study methodologies, such as sample sizes and statistical modeling techniques, may also account for these discrepancies.
In contrast, sociodemographic variables, such as sex, race, marital status, income and rural-urban continuum were not consistently associated with intrapulmonary metastatic patterns once tumor characteristics were taken into account. The Bayesian network did connect these variables to one another in clinically plausible ways, for example, the strong links between these sociodemographic variables, but these chains did not extend strongly to lung cancer type in the final models. This pattern suggests that, in this SEER cohort, variation in IPM versus SPLC is more tightly linked to tumor biology and anatomical distribution than to measured social factors. However, residual confounding by unmeasured exposures and access to care cannot be excluded.
Several limitations should be acknowledged. First, the study relied on data from the SEER database, which may have introduced bias due to missing or incomplete data, particularly for variables, such as smoking history, genetic mutations, and treatment modalities. Additionally, in keeping with the limitations of cross-sectional SEER data without temporal information, we therefore interpret both models as descriptive summaries of association patterns rather than as validated causal structures. The simulated interventions should likewise be understood as contrasts within the fitted probabilistic model and not as predictions of the effects of real-world clinical interventions. Future work that incorporates imaging-based measures, molecular profiles and longitudinal follow-up in prospective cohorts will be needed to clarify temporal sequences and to evaluate how these association patterns can best be integrated into diagnostic and treatment decision-making.
Conclusions
In conclusion, poorer histological grade, right-sided tumors were consistently associated with a higher probability of IPM, while measured sociodemographic variables showed weaker and less consistent associations after adjustment for tumor features. These findings, interpreted as observational associations, suggest that detailed assessment of tumor differentiation and anatomical extent may assist in characterising complex intrapulmonary disease patterns.
Supplementary
The article’s supplementary files as
Acknowledgments
We would like to thank the National Cancer Institute and the SEER Program tumor registries for their efforts in creating the SEER database.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.
Footnotes
Reporting Checklist: The authors have completed the STROBE reporting checklist. Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2025-1085/rc
Funding: This study was supported by the Research Grants Council (RGC) of Hong Kong (No. 14111222).
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2025-1085/coif). R.W.H.L. received consulting fees from Medtronic and Siemens Healthineer. C.S.H.N. received consulting fees from Johnson and Johnson, Medtronic, Olympus and Siemens Healthineer. The other authors have no conflicts of interest to declare.
References
- 1.Kratzer TB, Bandi P, Freedman ND, et al. Lung cancer statistics, 2023. Cancer 2024;130:1330-48. 10.1002/cncr.35128 [DOI] [PubMed] [Google Scholar]
- 2.LoPiccolo J, Gusev A, Christiani DC, et al. Lung cancer in patients who have never smoked - an emerging disease. Nat Rev Clin Oncol 2024;21:121-46. 10.1038/s41571-023-00844-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Mao Y, Yang D, He J, et al. Epidemiology of Lung Cancer. Surg Oncol Clin N Am 2016;25:439-45. 10.1016/j.soc.2016.02.001 [DOI] [PubMed] [Google Scholar]
- 4.Lam DC, Liam CK, Andarini S, et al. Lung Cancer Screening in Asia: An Expert Consensus Report. J Thorac Oncol 2023;18:1303-22. 10.1016/j.jtho.2023.06.014 [DOI] [PubMed] [Google Scholar]
- 5.National Lung Screening Trial Research Team , Aberle DR, Adams AM, et al. Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med 2011;365:395-409. 10.1056/NEJMoa1102873 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Heraganahally SS, Howarth TP, Sorger L. Chest computed tomography findings among adult Indigenous Australians in the Northern Territory of Australia. J Med Imaging Radiat Oncol 2022;66:337-44. 10.1111/1754-9485.13295 [DOI] [PubMed] [Google Scholar]
- 7.Xue W, Pan B, Huang F, et al. Imaging Features of Lung Ground-Glass Nodules and Their Correlation With Biological Behavior. Cureus 2025;17:e97775. 10.7759/cureus.97775 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Liang Y, He K, Zhang B, et al. The Diagnosis of Intrapulmonary Metastasis Multifocal Pulmonary Ground-Glass Nodules Based on Oncogenic Driver Mutation: Two Case Reports and Review of Literature. Front Surg 2021;8:812559. 10.3389/fsurg.2021.812559 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Xu B, Chen Z, Liu D, et al. Comparison of Outcomes Between Ablation and Lobectomy in Stage IA Non-Small Cell Lung Cancer: A Retrospective Multicenter Study. Respirology 2025;30:1192-201. 10.1111/resp.70116 [DOI] [PubMed] [Google Scholar]
- 10.Ganti AK, Klein AB, Cotarla I, et al. Update of Incidence, Prevalence, Survival, and Initial Treatment in Patients With Non-Small Cell Lung Cancer in the US. JAMA Oncol 2021;7:1824-32. 10.1001/jamaoncol.2021.4932 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Araujo-Filho JAB, Chang J, Mayoral M, et al. Are there imaging characteristics that can distinguish separate primary lung carcinomas from intrapulmonary metastases using next-generation sequencing as a gold standard? Lung Cancer 2021;153:158-64. 10.1016/j.lungcan.2021.01.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Detterbeck FC, Franklin WA, Nicholson AG, et al. The IASLC Lung Cancer Staging Project: Background Data and Proposed Criteria to Distinguish Separate Primary Lung Cancers from Metastatic Foci in Patients with Two Lung Tumors in the Forthcoming Eighth Edition of the TNM Classification for Lung Cancer. J Thorac Oncol 2016;11:651-65. [DOI] [PubMed] [Google Scholar]
- 13.Chang JC, Alex D, Bott M, et al. Comprehensive Next-Generation Sequencing Unambiguously Distinguishes Separate Primary Lung Carcinomas From Intrapulmonary Metastases: Comparison with Standard Histopathologic Approach. Clin Cancer Res 2019;25:7113-25. 10.1158/1078-0432.CCR-19-1700 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kar I, Vhora F, Bou Zerdan M, et al. Survival Determinants and Sociodemographic Disparities in Early-Onset Non-Small Cell Lung Cancer. JAMA Netw Open 2025;8:e2537307. 10.1001/jamanetworkopen.2025.37307 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Zeng Y, Lin F, Zhang L, et al. Development and validation of a machine learning model and nomogram for predicting brain metastasis in lung cancer: a population-based study. Clinics (Sao Paulo) 2025;80:100843. 10.1016/j.clinsp.2025.100843 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Howey R, Shin SY, Relton C, et al. Bayesian network analysis incorporating genetic anchors complements conventional Mendelian randomization approaches for exploratory analysis of causal relationships in complex data. PLoS Genet 2020;16:e1008198. 10.1371/journal.pgen.1008198 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Hozumi H, Shimizu H. Bayesian network enables interpretable and state-of-the-art prediction of immunotherapy responses in cancer patients. PNAS Nexus 2023;2:pgad133. 10.1093/pnasnexus/pgad133 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Oh JH, Craft J, Al Lozi R, et al. A Bayesian network approach for modeling local failure in lung cancer. Phys Med Biol 2011;56:1635-51. 10.1088/0031-9155/56/6/008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Sesen MB, Nicholson AE, Banares-Alcantara R, et al. Bayesian networks for clinical decision support in lung cancer care. PLoS One 2013;8:e82349. 10.1371/journal.pone.0082349 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Long E, Xu S, Liu Z, et al. Construction and implications of structural equation modeling network for pediatric cataract: a data mining research of rare diseases. BMC Ophthalmol 2017;17:74. 10.1186/s12886-017-0468-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Ma Q, Luo J, Cao H, et al. Social support, health behavior self-efficacy, and anxiety on physical activity levels among lung cancer survivors: a structural equation modeling. J Cancer Surviv 2024. [Epub ahead of print]. doi: . 10.1007/s11764-024-01626-y [DOI] [PubMed] [Google Scholar]
- 22.Du M, Xin J, Zheng R, et al. CYP2A6 Activity and Cigarette Consumption Interact in Smoking-Related Lung Cancer Susceptibility. Cancer Res 2024;84:616-25. 10.1158/0008-5472.CAN-23-0900 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Ghossein J, Gingras S, Zeng W. Differentiating primary from secondary lung cancer with FDG PET/CT and extra-pulmonary tumor grade. J Med Imaging Radiat Sci 2023;54:451-6. 10.1016/j.jmir.2023.05.045 [DOI] [PubMed] [Google Scholar]
- 24.Farjah F, Monsell SE, Greenlee RT, et al. Patient and Nodule Characteristics Associated With a Lung Cancer Diagnosis Among Individuals With Incidentally Detected Lung Nodules. Chest 2023;163:719-30. 10.1016/j.chest.2022.09.030 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Nie Y, Wang X, Yang F, et al. Surgical Prognosis of Synchronous Multiple Primary Lung Cancer: Systematic Review and Meta-Analysis. Clin Lung Cancer 2021;22:341-350.e3. 10.1016/j.cllc.2020.10.022 [DOI] [PubMed] [Google Scholar]
- 26.Alghamdi HI, Alshehri AF, Farhat GN. An overview of mortality & predictors of small-cell and non-small cell lung cancer among Saudi patients. J Epidemiol Glob Health 2018;7 Suppl 1:S1-6. 10.1016/j.jegh.2017.09.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Jørgensen N, Meline EL, Jeppesen SS, et al. The effect of tumor laterality on survival for non-small cell lung cancer patients treated with radiotherapy. Acta Oncol 2019;58:1393-8. 10.1080/0284186X.2019.1629011 [DOI] [PubMed] [Google Scholar]
- 28.Wu J, Mo A, Zhuang W, et al. Distinctive clinical and radiological characteristics of persistent super-multiple ground-glass pulmonary nodules. Int J Cancer 2026;158:60-8. 10.1002/ijc.70066 [DOI] [PubMed] [Google Scholar]




