Abstract
Objectives
To construct a pathomics-based machine learning model to enhance the diagnostic efficacy of LungPro navigational bronchoscopy for peripheral pulmonary lesions and to optimize the management strategy for LungPro-diagnosed negative lesions.
Methods
Clinical data and hematoxylin and eosin (H&E)-stained whole slide images (WSIs) were collected from 144 consecutive patients undergoing LungPro virtual bronchoscopy at a single institution between January 2022 and December 2023. Patients were stratified into diagnosis-positive and diagnosis-negative cohorts based on histopathological or etiological confirmation. An artificial intelligence (AI) model was developed and validated using 94 diagnosis-positive cases. Logistic regression (LR) identified associations between clinical/imaging characteristics and malignant pulmonary lesion risk factors. We implemented a convolutional neural network (CNN) with weakly supervised learning to extract image-level features, followed by multiple instance learning (MIL) for patient-level feature aggregation. Multiple machine learning (ML) algorithms were applied to model the extracted features. A multimodal diagnostic framework integrating clinical, imaging, and pathomics data were subsequently developed and evaluated on 50 LungPro-negative patients to assess the framework's diagnostic performance and predictive validity.
Results
Univariable and multivariable logistic regression analyses identified that age, lesion boundary and mean computed tomography (CT) attenuation were independent risk factors for malignant peripheral pulmonary lesions (P < 0.05). A histopathological model using a MIL fusion strategy showed strong diagnostic performance for lung cancer, with area under the curve (AUC) values of 0.792 (95% CI 0.680–0.903) in the training cohort and 0.777 (95% CI 0.531–1.000) in the test cohort. Combining predictive clinical features with pathological characteristics enhanced diagnostic yield for peripheral pulmonary lesions to 0.848 (95% CI 0.6945–1.0000). In patients with initially negative LungPro biopsy results, the model identified 20 of 28 malignant lesions (sensitivity: 71.43%) and 15 of 22 benign lesions (specificity: 68.18%). Class activation mapping (CAM) validated the model by highlighting key malignant features, including conspicuous nucleoli and nuclear atypia.
Conclusions
The fusion diagnostic model that incorporates clinical and pathomic features markedly enhances the diagnostic accuracy of LungPro in this retrospective cohort. This model aids in the detection of subtle malignant characteristics, thereby offering evidence to support precise and targeted therapeutic interventions for lesions that LungPro classifies as negative in clinical settings.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12938-025-01440-2.
Keywords: LungPro navigational bronchoscopy, Pathomics, Peripheral pulmonary lesions, Diagnosis, Lung cancer
Introduction
Lung cancer remains the leading cause of cancer-related mortality worldwide, underscoring the critical importance of early diagnosis and intervention in reducing its high fatality rate [1]. The widespread adoption of CT has significantly increased the detection rate of indeterminate peripheral pulmonary lesions (PPLs), making the qualitative diagnosis of lung cancer presenting as PPLs a pivotal challenge in the realm of early lung cancer detection.
Extensive clinical research has demonstrated that the integration of advanced navigation technologies [2] and robotic-assisted systems [3] not only enhances the precision of interventional diagnosis and treatment but also reduces radiation exposure and mitigates the variability introduced by operator skill and experience. These advancements have provided a solid foundation for the minimally invasive diagnosis and treatments of PPLs via bronchoscopic approaches [4]. Within this array of technologies, LungPro, an augmented reality optical lung navigation system that integrates augmented reality and infrared optical tracking, has risen to prominence as a widely adopted navigation system in clinical settings [5]. The primary breakthrough of this technology is its dual-channel synchronous visualization feature, allowing simultaneous visualization of virtual bronchoscopic 3D animations and the actual bronchoscopic view. This includes spatial mapping technology overlaying a predetermined navigation route onto the live images, greatly improving the accuracy of targeting endobronchial lesions [6]. In addition, the system is equipped with multimodal image fusion capabilities: it creates 3D representations of the tracheobronchial tree and lung blood vessels to aid in planning routes to avoid vessel punctures within the airway, and it is capable of producing virtual fluoroscopic images from X-ray scans during surgery, allowing for the visualization of specific lesions and bronchial routes [7]. Furthermore, LungPro broadens its diagnostic scope beyond the bronchial lumen via the bronchoscopic transparenchymal nodule access (BTPNA) method, creating bronchial transparenchymal tunnels to reach areas outside the bronchi, thus enhancing the thoroughness of the diagnosis. Despite its technological benefits, LungPro, akin to other navigational technologies, encounters many constraints. This encompasses a notably elevated rate of false negatives owing to variances between lesion localization during surgery and CT scans before surgery [8]. Key difficulties in sampling via navigational bronchoscopy add to these false negatives. Samples collected using methods such as endobronchial ultrasound-guided transbronchial needle aspiration (EBUS–TBNA) or biopsy forceps [9] typically consist of cytological samples with compromised tissue structure and reduced cell yield [10]. Research shows that, even in cases of tumors that are easily reachable, these samples have an average of only 20–25% tumor tissue, and half of them show a fragmented state without tumor cells [11]. As a result, the core challenges in diagnosing bronchoscopic malignancies, such as insufficient sample size, ineffective tumor differentiation, and diverse intratumoral characteristics, likely play a significant role in the failure of diagnosis and lead to LungPro’s high false-negative rates; this study primarily aims to enhance the system's diagnostic precision by tackling these issues.
In recent years, the rapid advancement of AI has catalyzed the development of digital pathology. By leveraging CNNs [12] to automatically extract high-throughput image features, researchers can uncover predictive biomarkers hidden within large data sets, enhance diagnostic accuracy for complex and rare cases, and facilitate early cancer detection. Notably, deep learning-based pathology models have demonstrated remarkable success in diagnosing biopsy specimens across a wide range of cancers, including prostate cancer [13], breast cancer [14], and intraductal papillary mucinous neoplasms [15]. These models exhibit diagnostic accuracy on par with that of experienced pathologists while significantly enhancing the detection of rare subtypes and subtle lesions that are often challenging to identify using conventional methods [16]. Moreover, advanced techniques such as CAM [17] provide visual localization of key diagnostic regions, thereby improving the interpretability and transparency of model predictions. While pathomics plays a significant role in cancer diagnosis by leveraging biopsy-based data, synergistic integration with complementary modalities like radiomics can maximize its clinical impact. Canfora et al. serve as a prime example of this [18], showcases how CT-based radiomics facilitates the pre-surgery categorization of cancer patients into low-risk and high-risk groups, offering insights into biopsy outcomes that are unattainable and leading the way in non-invasive lesion biology forecasting. In clinical practice, pathomics enhances pathologists'diagnostic efficiency through multimodal data integration, encompassing clinical, imaging, and molecular information [19–21]. When combined with advances such as radiomics, this integrated approach holds great promise for advancing early cancer screening and enabling personalized treatment strategies, ultimately contributing to more precise and effective cancer management.
Building on these advancements, we propose a clinical–imaging–pathological multi-omics fusion strategy to establish a malignant risk prediction model for LungPro navigational bronchoscopy samples (lavage/brushing/forceps biopsies). By integrating clinical, radiomic, and pathomic features, the model is designed to non-invasively predict and analyze potentially malignant lesions, thereby enhancing the diagnostic accuracy of LungPro procedures. Furthermore, this approach is expected to optimize clinical management strategies for PPLs initially classified as negative by LungPro. Specifically, it will provide valuable guidance for clinical decision-making, such as determining the need for surgical intervention, percutaneous lung biopsy, or follow-up surveillance, ultimately improving patient outcomes (Figs. 1, 2).
Fig. 1.
Workflow of this study
Fig. 2.
Flowchart of patient selection. A total of 225 lung nodules from 218 patients were screened, with lung nodules from 144 patients being included in the study
Results
Clinical features
This study enrolled 144 patients with PPLs, including 94 malignant and 50 benign cases confirmed by final diagnosis (Table 1). Adenocarcinoma constituted the predominant histological subtype among malignancies. Initial LungPro biopsies showed limited diagnostic efficacy for rare pulmonary neoplasms (e.g., spindle cell carcinoma and lymphoepithelioma-like carcinoma). LungPro navigation bronchoscopy demonstrated a diagnostic yield of 65.28% for PPLs. Detailed clinical and navigational information for patients in the diagnostic and non-diagnostic groups is presented in Table 2. Comparison of the clinical and navigational data between the two groups revealed no significant differences in age, sex, smoking history, bronchus sign, lobulation, spiculation, vascular convergence, lesion location, distance to the pleura, or incidence of severe complications. However, statistically significant differences were observed between the groups regarding lesion character (P = 0.032), lesion diameter (P = 0.025), lesion volume (P = 0.029), distance between the lesion and opening of the lobar bronchus (P = 0.047), bronchial generation of bronchoscope (P = 0.008), and number of bronchi communicating with the lesion (P < 0.001).
Table 1.
Distribution of patient group and final pathological diagnostic results of study subjects
| Pathological type (%) | LungPro diagnostic group (n = 94) | LungPro non-diagnostic group (n = 50) |
|---|---|---|
| Benign | 28 (29.79) | 22 (44.0) |
| Tuberculosis | 12 (12.77) | 2 (4.0) |
| Organizing pneumonia | 15 (15.96) | 14 (28.0) |
| Benign tumor | ||
| Pulmonary epithelioid hemangioendothelioma | 1 (1.06) | 0 |
| Sclerosing pneumocytoma | 0 | 2 (4.0) |
| Inflammatory myofibroblastic tumor | 0 | 4 (8.0) |
| Malignant | 66 (70.21) | 28 (56.0) |
| NSCLC | ||
| Adenocarcinoma | 45 (47.87) | 22 (44.0) |
| Squamous cell carcinoma | 8 (8.51) | 0 |
| Large cell carcinoma | 1 (1.06) | 0 |
| Adenosquamous carcinoma | 0 | 1 (2.0) |
| Lymphoepithelioma-like carcinoma | 0 | 1 (2.0) |
| NSCLC, not otherwise specified | 6 (6.39) | 1 (2.0) |
| Sarcomatoid carcinoma | ||
| Pleomorphic carcinoma | 1 (1.06) | 2 (4.0) |
| Spindle cell carcinoma | 0 | 1 (2.0) |
| Small cell carcinoma | 5 (5.32) | 0 |
Table 2.
Comparison of clinical and navigational characteristics between patients in the LungPro diagnostic and non-diagnostic groups
| Variables | Total (n = 144) | Diagnostic group (n = 94) | Non-diagnostic group (n = 50) | P |
|---|---|---|---|---|
| Age, mean ± SD (range), years | 60.62 ± 12.12 | 61.33 ± 11.25 | 59.28 ± 13.62 | 0.336 |
| Lesion_diameter, mean ± SD (range), mm | 31.98 ± 12.85 | 33.71 ± 12.74 | 28.66 ± 12.53 | 0.025 |
| Volume, mean ± SD, mm3 | 771.78 ± 581.88 | 848.39 ± 566.08 | 623.34 ± 589.22 | 0.029 |
| Distance between the lesion and opening of the lobar bronchus, Mean ± SD, mm | 38.66 ± 15.27 | 36.82 ± 14.35 | 42.12 ± 16.47 | 0.047 |
| Distance to the pleura, Mean ± SD, mm | 11.23 ± 15.09 | 10.93 ± 15.32 | 11.80 ± 14.79 | 0.742 |
| Mean_CT_attenuation, M(P25,P75), HU | − 55.04 (− 120.90, 3.40) | − 55.08 (− 106.72, −3.00) | − 41.09 (− 155.49, 9.91) | 0.656 |
| Gender, n(%) | ||||
| Female | 73 (50.69) | 48 (51.06) | 25 (50.00) | 0.903 |
| Male | 71 (49.31) | 46 (48.94) | 25 (50.00) | |
| Smoking history, n(%) | ||||
| Yes | 74 (51.39) | 49 (52.13) | 25 (50.00) | 0.808 |
| No | 70 (48.61) | 45 (47.87) | 25 (50.00) | |
| Character, n(%) | ||||
| Solid | 130 (90.28) | 89 (94.68) | 41 (82.00) | 0.032 |
| Part solid | 14 (9.72) | 5 (5.32) | 9 (18.00) | |
| Lobulation, n(%) | ||||
| Absent | 50 (34.72) | 31 (32.98) | 19 (38.00) | 0.547 |
| Present | 94 (65.28) | 63 (67.02) | 31 (62.00) | |
| Spiculation, n(%) | ||||
| Absent | 60 (41.67) | 38 (40.43) | 22 (44.00) | 0.679 |
| Present | 84 (58.33) | 56 (59.57) | 28 (56.00) | |
| Vascular_convergence, n(%) | ||||
| Absent | 65 (45.14) | 45 (47.87) | 20 (40.00) | 0.366 |
| Present | 79 (54.86) | 49 (52.13) | 30 (60.00) | |
| Pleural_indentation, n(%) | ||||
| Absent | 63 (43.75) | 37 (39.36) | 26 (52.00) | 0.146 |
| Present | 81 (56.25) | 57 (60.64) | 24 (48.00) | |
| Bronchus sign on CT, n(%) | ||||
| Absent | 30 (20.83) | 16 (17.02) | 14 (28.00) | 0.122 |
| Present | 114 (79.17) | 78 (82.98) | 36 (72.00) | |
| Labor location, n(%) | ||||
| Left upper segment | 27 (18.75) | 21 (22.34) | 6 (12.00) | 0.653 |
| Left linguar segment | 7 (4.86) | 5 (5.32) | 2 (4.00) | |
| Left lower lobe | 17 (11.81) | 11 (11.70) | 6 (12.00) | |
| Right upper lobe | 51 (35.42) | 33 (35.11) | 18 (36.00) | |
| Right middle lobe | 10 (6.94) | 6 (6.38) | 4 (8.00) | |
| Right lower lobe | 32 (22.22) | 18 (19.15) | 14 (28.00) | |
| Lesion boundary, n(%) | ||||
| Well-defined | 117 (81.25) | 76 (80.85) | 41 (82.00) | 0.866 |
| Poor-defined | 27 (18.75) | 18 (19.15) | 9 (18.00) | |
| Bronchial generation of bronchoscope, n(%) | ||||
| < Generation 5 | 120 (83.33) | 84 (89.36) | 36 (72.00) | 0.008 |
| ≥ Generation 5 | 24 (16.67) | 10 (10.64) | 14 (28.00) | |
| Number of bronchi communicating with the lesion, n(%) | ||||
| ≤ 3 | 126 (87.50) | 76 (80.85) | 50 (100.00) | < 0.001 |
| > 3 | 18 (12.50) | 18 (19.15) | 0 (0.00) | |
| Complications, n(%) | ||||
| None | 133 (92.36) | 84 (89.36) | 49 (98.00) | 0.165 |
| Massive hemorrhage | 10 (6.94) | 9 (9.57) | 1 (2.00) | |
| Pneumothorax | 1 (0.69) | 1 (1.06) | 0 (0.00) | |
Using a random number table, patients with positive diagnoses were stratified into training (n = 66) and internal validation (n = 28) cohorts at a 7:3 ratio. Table 3 summarizes baseline clinical characteristics and navigational imaging parameters. No statistically significant differences were observed between cohorts for any baseline characteristics, including continuous variables (age, lesion diameter, volume, mean CT attenuation, distance to pleural surface, etc.) and categorical variables (sex, smoking status, final diagnosis, and imaging features) (all P > 0.05). Comparable malignancy rates were observed between training (68.28%) and validation cohorts (78.57%, P = 0.309), confirming balanced diagnostic label distribution.
Table 3.
Baseline characteristics of the patients in the diagnostic positive group
| Variables | All (n = 94) | Train (n = 66) | Test (n = 28) | P |
|---|---|---|---|---|
| Age, mean ± SD(range), years | 61.45 ± 11.31 | 61.52 ± 11.11 | 61.29 ± 11.97 | 0.929 |
| Lesion_diameter, mean ± SD(range), mm | 33.81 ± 12.84 | 32.83 ± 12.22 | 36.11 ± 14.17 | 0.261 |
| Volume, Mean ± SD, mm3 | 852.65 ± 567.11 | 801.14 ± 553.36 | 972.22 ± 590.61 | 0.183 |
| Mean CT attenuation, M (P25, P75), HU | − 55.04 (− 105.68, − 0.84) | − 48.83 (− 96.62, − 0.84) | − 73.35 (− 126.05, 0.42) | 0.400 |
| Distance between the lesion and opening of the lobar bronchus, Mean ± SD, mm | 35.85 ± 12.57 | 35.44 ± 12.56 | 36.82 ± 12.78 | 0.629 |
| Distance to the pleura, mean ± SD, mm | 10.99 ± 15.36 | 11.20 ± 14.08 | 10.50 ± 18.31 | 0.842 |
| Gender, n(%) | ||||
| Female | 48 (51.06) | 31 (46.97) | 17 (60.71) | 0.223 |
| Male | 46 (48.94) | 35 (53.03) | 11 (39.29) | |
| Smoking history, n(%) | ||||
| No | 49 (52.13) | 32 (48.48) | 17 (60.71) | 0.278 |
| Yes | 45 (47.87) | 34 (51.52) | 11 (39.29) | |
| Character, n(%) | ||||
| Solid | 90 (95.74) | 64 (96.97) | 26 (92.86) | 0.730 |
| Part solid | 4 (4.26) | 2 (3.03) | 2 (7.14) | |
| Lobulation, n(%) | ||||
| Absent | 30 (31.91) | 22 (33.33) | 8 (28.57) | 0.651 |
| Present | 64 (68.09) | 44 (66.67) | 20 (71.43) | |
| Spiculation, n(%) | ||||
| Absent | 38 (40.43) | 25 (37.88) | 13 (46.43) | 0.440 |
| Present | 56 (59.57) | 41 (62.12) | 15 (53.57) | |
| Vascular convergence, n(%) | ||||
| Absent | 44 (46.81) | 34 (51.52) | 10 (35.71) | 0.160 |
| Present | 50 (53.19) | 32 (48.48) | 18 (64.29) | |
| Pleural indentation, n(%) | ||||
| Absent | 37 (39.36) | 24 (36.36) | 13 (46.43) | 0.361 |
| Present | 57 (60.64) | 42 (63.64) | 15 (53.57) | |
| Bronchus sign on CT, n(%) | ||||
| Absent | 15 (15.96) | 11 (16.67) | 4 (14.29) | 1.000 |
| Present | 79 (84.04) | 55 (83.33) | 24 (85.71) | |
| Labor location, n(%) | ||||
| Left upper segment | 20 (21.28) | 18 (27.27) | 2 (7.14) | 0.134 |
| Left linguar segment | 5 (5.32) | 3 (4.55) | 2 (7.14) | |
| Left lower lobe | 10 (10.64) | 7 (10.61) | 3 (10.71) | |
| Right upper lobe | 34 (36.17) | 19 (28.79) | 15 (53.57) | |
| Right middle lobe | 6 (6.38) | 5 (7.58) | 1 (3.57) | |
| Right lower lobe | 19 (20.21) | 14 (21.21) | 5 (17.86) | |
| Lesion boundary, n(%) | ||||
| Well-defined | 77 (81.91) | 57 (86.36) | 20 (71.43) | 0.085 |
| Poor-defined | 17 (18.09) | 9 (13.64) | 8 (28.57) | |
| Bronchial generation of bronchoscope, n(%) | ||||
| < Generation 5 | 86 (91.49) | 62 (93.94) | 24 (85.71) | 0.367 |
| ≥ Generation 5 | 8 (8.51) | 4 (6.06) | 4 (14.29) | |
| Number of bronchi communicating with the lesion, n(%) | ||||
| ≤ 3 | 76 (80.85) | 55 (83.33) | 21 (75.00) | 0.348 |
| > 3 | 18 (19.15) | 11 (16.67) | 7 (25.00) | |
| Final diagnosis | ||||
| Benign | 27(28.72) | 21(31.82) | 6(21.43) | 0.309 |
| Malignant | 67(71.28) | 45(68.28) | 22(78.57) | |
Table 4 presents the results of univariate logistic regression analysis in the training cohort, identifying imaging features and clinical variables with statistically significant discriminatory power for distinguishing benign from malignant lesions: lobulation, spiculation, vascular convergence, pleural indentation, bronchus sign on CT, lesion boundary, age, and mean CT attenuation (all P < 0.05). Following adjustment for confounding factors, subsequent multivariate logistic regression analysis confirmed age (P = 0.005; OR = 1.11, 95% CI 1.03–1.19), lesion boundary (P = 0.005; OR = 0.05, 95% CI 0.01–0.40), and mean CT attenuation (P = 0.006; OR = 0.99, 95% CI 0.98–0.99) as independent predictors of malignancy. These quantitative radiomic biomarkers were subsequently incorporated as core components within the multimodal pathomics diagnostic framework.
Table 4.
Univariable and multivariable analyses of clinical features in predicting benign and malignant PPLs
| Variables | Univariate analysis | Multivariate analysis | ||
|---|---|---|---|---|
| P | OR (95%CI) | P | OR (95%CI) | |
| Gender | ||||
| Female | 1.00 (Reference) | |||
| Male | 0.416 | 0.69 (0.28 ~ 1.69) | ||
| Smoking history | ||||
| No | 1.00 (Reference) | |||
| Yes | 0.973 | 0.98 (0.40 ~ 2.41) | ||
| Character | ||||
| Solid | 1.00 (Reference) | |||
| Part solid | 0.867 | 1.22 (0.12–12.26) | ||
| Lobulation | ||||
| Absent | 1.00 (Reference) | 1.00 (Reference) | ||
| Present | 0.003 | 4.33 (1.67–11.23) | 0.374 | 2.24 (0.38–13.29) |
| Spiculation | ||||
| Absent | 1.00 (Reference) | 1.00 (Reference) | ||
| Present | 0.006 | 3.72 (1.46–9.50) | 0.768 | 0.75 (0.12–4.94) |
| Vascular convergence | ||||
| Absent | 1.00 (Reference) | 1.00 (Reference) | ||
| Present | < 0.001 | 9.00 (3.01–26.95) | 0.067 | 3.95 (0.91–17.21) |
| Pleural indentation | ||||
| Absent | 1.00 (Reference) | 1.00 (Reference) | ||
| Present | 0.044 | 2.56 (1.02–6.38) | 0.427 | 1.87 (0.40–8.69) |
| Bronchus sign on CT | ||||
| Absent | 1.00 (Reference) | 1.00 (Reference) | ||
| Present | 0.006 | 5.08 (1.60–16.20) | 0.109 | 4.17 (0.73–23.89) |
| Labor location | ||||
| Left upper segment | 1.00 (Reference) | |||
| Left linguar segment | 1.000 | 1.00 (0.09 ~ 11.59) | ||
| Left lower lobe | 1.000 | 1.00 (0.15 ~ 6.67) | ||
| Right upper lobe | 0.332 | 0.52 (0.14 ~ 1.94) | ||
| Right middle lobe | 0.992 | 10,636,203.08 (0.00 – Inf) | ||
| Right lower lobe | 0.077 | 0.28 (0.07–1.15) | ||
| Lesion boundary | ||||
| Well-defined | 1.00 (Reference) | 1.00 (Reference) | ||
| Poor-defined | < 0.001 | 0.07 (0.02–0.24) | 0.005 | 0.05 (0.01–0.40) |
| Bronchial generation of bronchoscope | ||||
| < Generation 5 | 1.00 (Reference) | |||
| ≥ Generation 5 | 0.990 | 19,469,659.89 (0.00 – Inf) | ||
| Number of bronchi communicating with the lesion | ||||
| ≤ 3 | 1.00 (Reference) | |||
| > 3 | 0.083 | 3.92 (0.84–18.40) | ||
| Age | 0.004 | 1.07 (1.02–1.12) | 0.005 | 1.11 (1.03–1.19) |
| Lesion size | 0.130 | 1.03 (0.99–1.07) | ||
| Volume | 0.330 | 1.00 (1.00–1.00) | ||
| Mean CT attenuation | 0.018 | 0.99 (0.99–0.99) | 0.006 | 0.99 (0.98–0.99) |
| Distance between the lesion and opening of the lobar bronchus | 0.561 | 1.01 (0.97–1.05) | ||
| Distance to the pleura | 0.555 | 1.01 (0.98–1.04) | ||
Patch-level prediction
Patch-level efficiency
The AUC values presented in Table 5 illustrate the performance of three deep learning models—DenseNet121, ResNet50, and ResNet18—across training and testing cohorts. DenseNet121 achieved an exceptionally high AUC of 0.989 (95% CI 0.9889–0.9895) in the training cohort but experienced a significant drop to 0.766 (95% CI 0.7637–0.7690) in the test cohort, indicating overfitting. Similarly, ResNet50 demonstrated strong training performance (AUC = 0.962; 95% CI 0.9616–0.9629) but reduced generalizability in testing (AUC = 0.786; 95% CI 0.7833–0.7885). In contrast, ResNet18 achieved the highest test AUC of 0.790 (95% CI 0.7872–0.7923) despite a slightly lower training AUC of 0.947 (95% CI 0.9465–0.9481), suggesting superior stability across the data set.
Table 5.
Patch-level accuracy and AUC scores for each CNN model
| Model Name | Acc | AUC | 95% CI | Sensitivity | Specificity | PPV | NPV | Cohort |
|---|---|---|---|---|---|---|---|---|
| densenet121 | 0.945 | 0.989 | 0.9889–0.9895 | 0.946 | 0.941 | 0.984 | 0.817 | Train |
| densenet121 | 0.714 | 0.766 | 0.7637–0.7690 | 0.816 | 0.617 | 0.671 | 0.778 | Test |
| resnet50 | 0.897 | 0.962 | 0.9616–0.9629 | 0.904 | 0.873 | 0.965 | 0.700 | Train |
| resnet50 | 0.746 | 0.786 | 0.7833–0.7885 | 0.827 | 0.670 | 0.705 | 0.802 | Test |
| resnet18 | 0.875 | 0.947 | 0.9465–0.9481 | 0.880 | 0.852 | 0.958 | 0.648 | Train |
| resnet18 | 0.735 | 0.790 | 0.7872–0.7923 | 0.814 | 0.660 | 0.696 | 0.788 | Test |
The performance analysis, based on the AUC values, demonstrates that ResNet18 achieved the optimal balance between training and testing efficacy among the evaluated models (Fig. 3). Therefore, we selected ResNet18 for constructing image-level features. However, due to potential variability in prediction results from different imaging angles of the same patient, we introduced a MIL model to aggregate predictions at the patient level. This approach ensures robust and reliable diagnostic outcomes by utilizing the predictions and probabilities generated by ResNet18 in the subsequent MIL fusion process.
Fig. 3.
Receiver operating characteristic (ROC) curves of model performance in differentiating benign and malignant lesions across cohorts in patch level. A Training cohort; B test cohort
Grad-CAM visualization
The gradient-weighted class activation mapping (Grad-CAM) [22, 23] method facilitates the generation of activation maps without requiring changes to the existing model architecture or additional training. Illustrated in Fig. 4 is the application of Grad-CAM, highlighting the activation within the final convolutional layer for predicting cancer types. By making this layer transparent, the method reveals the areas of the input image that are most influential in the model's decision-making process. This technique provides critical insights into how the model arrives at its predictions, all accomplished without necessitating intricate modifications to the architecture or retraining.
Fig. 4.
Representative visualization cases of tuberculosis (upper panel) and lung adenocarcinoma (lower panel). For each case, a histopathological tile image (left) is paired with its corresponding class activation heat map (right). Red regions in the heat maps indicate areas of heightened diagnostic relevance for the respective pathology, with intensity values scaled according to the adjacent color bar. Heat map weights reflect the relative contribution of tissue regions to the model's classification decision
Visualization of predictions
Our pathological model demonstrates high accuracy in identifying tumor regions within the tiles, as evidenced by the visualizations in Fig. 5. The combination of the prediction heatmap and the probability map provides dual verification for distinguishing benign and malignant adenocarcinoma. The probability map shows that the model displays yellow coloration (P = 0.8–1.0) in gland-dense regions, consistent with areas of nuclear atypia under pathological microscopy. In contrast, fibrous stromal and inflammatory regions appear dark blue (P = 0.0–0.4), with visible distributions of lymphocytes and fibroblasts.
Fig. 5.
Diagnostic visualization for pulmonary adenocarcinoma in benign-malignant classification. A Histopathological H&E-stained biopsy specimen with ROI, where red denotes tumor regions, pink denotes inflammatory regions, and black indicates suspected necrotic areas, B class activation heat map highlighting regions of morphological significance, C malignancy probability distribution map with warm colors indicating higher diagnostic confidence (0–1 scale). Colored overlays in B and C demonstrate spatial concordance between histopathological features and model interpretability outputs
Patient-level prediction
In the patient-level prediction process, we employed a fivefold cross-validation strategy and used a Gridsearch algorithm to optimize the model’s hyperparameters, subsequently training the optimal model on the entire training set.
Table 6 presents the patient-level fivefold cross-validation results for three classical machine learning algorithms. The performance metrics reveal a substantial decline in the AUC of Logistic Regression (LR), decreasing from 0.688 (95% CI 0.540–0.835) during training to 0.417 (95% CI 0.037–0.796) during testing (Table 6). This significant performance drop, coupled with the underfitting demonstrated in its learning curve (Figure 7), indicates LR's suboptimal performance and limited generalizability. In contrast, both RandomForest and ExtraTrees exhibited enhanced robustness and generalizability. Their training AUCs were 0.868 (95% CI 0.775–0.962) and 0.792 (95% CI 0.680–0.903), respectively, while their validation AUCs were 0.591 (95% CI 0.226–0.956) and 0.777 (95% CI 0.531–1.000), respectively (Figure 6). Furthermore, the learning curves in Fig. 7 indicate a good fit for both RandomForest and ExtraTrees models. These results collectively suggest that the nonlinear models, particularly ExtraTrees, are better able to capture the inherent nonlinearity of the multi-instance learning features, a conclusion further supported by their robust performance in testing scenarios.
Table 6.
Metrics in train and test cohort in patient-level prediction
| model_name | Accuracy | AUC | 95% CI | Sensitivity | Specificity | PPV | NPV | Cohort |
|---|---|---|---|---|---|---|---|---|
| LR | 0.667 | 0.688 | 0.540–0.835 | 0.689 | 0.619 | 0.795 | 0.481 | Train |
| LR | 0.821 | 0.417 | 0.037–0.796 | 0.955 | 0.333 | 0.840 | 0.667 | Test |
| RandomForest | 0.773 | 0.868 | 0.775–0.962 | 0.756 | 0.810 | 0.895 | 0.607 | Train |
| RandomForest | 0.786 | 0.591 | 0.226–0.956 | 0.864 | 0.500 | 0.864 | 0.500 | Test |
| ExtraTrees | 0.727 | 0.792 | 0.680–0.903 | 0.778 | 0.619 | 0.814 | 0.565 | Train |
| ExtraTrees | 0.571 | 0.777 | 0.531–1.000 | 0.500 | 0.833 | 0.917 | 0.312 | Test |
Fig. 7.
Learning curves for train and test cohorts in LR (a), RandomForest (b) and ExtraTrees (c) for predicting benign and malignant PPLs
Fig. 6.
Receiver operating characteristic (ROC) curves for patient-level benign-malignant classification. A Performance comparison of three models in the training set; B performance comparison of three models in the test set
Signature comparison
To evaluate optimal model performance, top-performing algorithms from the Test cohort were systematically selected across clinical and pathological frameworks. Within the pathological framework, ExtraTrees was prioritized due to its outperformance in validation metrics. For clinical predictions, LR emerged as the optimal choice (rationale detailed in Supplementary 2 A). A combined model constructed through linear fusion of LR and ExtraTrees outputs was further analyzed to assess synergistic effects against standalone approaches (Table 7). The fusion strategy demonstrated marked diagnostic superiority over individual models, yielding AUCs of 0.909 (95% CI 0.8115–1.0000) and 0.848 (95% CI 0.6945–1.0000) in training and test sets, respectively (Figure 8). As illustrated in Fig. 9, while a statistically significant difference existed between the Pathomics and Combined models in the training set (p value = 0.038), no significant inter-model differences were observed in the test cohort (all p value > 0.5) per DeLong’s test.
Table 7.
Prediction performance of different signatures
| Signature | Accuracy | AUC | 95% CI | Sensitivity | Specificity | PPV | NPV | Cohort |
|---|---|---|---|---|---|---|---|---|
| Clinical | 0.758 | 0.871 | 0.7887–0.9541 | 0.644 | 1.000 | 1.000 | 0.568 | train |
| Pathomics | 0.773 | 0.792 | 0.6798–0.9032 | 0.844 | 0.619 | 0.826 | 0.650 | train |
| Combined | 0.939 | 0.909 | 0.8115–1.0000 | 0.978 | 0.857 | 0.936 | 0.947 | train |
| Clinical | 0.679 | 0.784 | 0.6019–0.9663 | 0.636 | 0.833 | 0.933 | 0.385 | test |
| Pathomics | 0.714 | 0.777 | 0.5313–1.0000 | 0.682 | 0.833 | 0.937 | 0.417 | test |
| Combined | 0.714 | 0.848 | 0.6945–1.0000 | 0.636 | 1.000 | 1.000 | 0.429 | test |
Fig. 8.
Comparison of AUC values for different signatures between the training (A) and test (B) cohorts
Fig. 9.
Receiver operating characteristic curve comparisons using DeLong’s test statistical differences between clinical, pathomics, and combined model performance. Significance thresholds: P < 0.05
Clinical use
Calibration curve The Hosmer–Lemeshow (HL) test evaluates the calibration of predictive models by assessing the agreement between predicted probabilities and observed outcomes. A lower HL statistic indicates better model calibration, reflecting closer alignment of predictions with actual results. In this study, the Hosmer–Lemeshow test revealed significant lack of calibration in both clinical (training: P = 0.015; test: P = 0.139) and pathomics models (training: P = 0.029; test: P = 0.354), whereas the combined model demonstrated excellent calibration for benign-malignant classification (training: P = 0.297; test: P = 0.088).
DCA Decision curve analysis (DCA) for the training and testing sets is presented in Fig. 10D, E. From the results, it can be observed that our fusion model yields noticeable benefits based on the predicted probabilities. In addition, compared to other signatures, it exhibits a larger potential for obtaining net benefit (Fig. 10).
Fig. 10.
Constructed multiparametric pathomics nomogram and calibration curves. a Developed clinical–pathological fusion nomogram for differentiating benign and malignant PPLs. b, c Comparative calibration curves of model signatures across training (B) and test (C) cohorts. d, e Different model signatures' decision curves in the training (D) and test (E) cohorts
Predictive performance of combined pathomics models in LungPro-negative peripheral lesions
Among the 50 samples with negative LungPro diagnostic results, some cases lacked definitive diagnoses due to limitations in biopsy or the inability to obtain immunohistochemical results through lavage or brushing. These cases were subsequently confirmed through surgery, percutaneous lung biopsy, or ≥ 3-month CT follow-up showing lesion resolution. In the comparison of the three models, although the combined model did not demonstrate statistically significant superiority over the clinical or pathomics models on the test set based on the DeLong test, it consistently exhibited superior performance across key metrics (AUC, specificity, PPV). Furthermore, it demonstrated unique clinical utility as evidenced by decision curve analysis (DCA) and calibration curves. Consequently, the combined model was selected for predicting malignancy within the LungPro diagnosis-negative cohort in this study.
Applying this combined model to the diagnosis-negative samples for benign versus malignant classification yielded successful classification of 20/28 (71.43%) malignant samples and 15/22 (68.18%) benign samples (Table 8). Figure 11 illustrates the corresponding ROC curve, achieving an AUC of 0.784 (95% CI 0.641–0.927). The cutoff value for distinguishing benign from malignant lesions was determined based on the Youden index (0.647).
Table 8.
Performance of the combined pathomics model applied to the LungPro-negative diagnosis data set
| Combined model | Predicted malignant | Predicted benign |
|---|---|---|
| Real malignant | 20 | 8 |
| Real benign | 7 | 15 |
Fig. 11.

ROC curve of the combined model in the LungPro diagnosis-negative cohort
For samples successfully predicted as malignant by the model, two pathologists independently delineated Regions of Interest (ROIs) corresponding to the model-predicted lesion areas. These ROIs were then compared with the Class Activation Mapping (CAM) heatmaps generated by the model for the same regions (see Supplementary Figure 2). Subsequent consensus review of the high-resolution pathology images during this process identified 6 cases exhibiting definitive malignant pathological features, including nuclear membrane irregularity (Ashworth criteria Grades 2–3), conspicuous nucleoli (diameter > 2 μm), increased nuclear-to-cytoplasmic ratio, and nuclear distortion.
Discussion
Peripheral pulmonary lesions (PPLs), particularly those less than 3 cm in diameter, pose a significant diagnostic challenge in clinical practice. With the widespread implementation of lung cancer screening, the accurate diagnosis of PPLs, including pulmonary nodules, has emerged as a significant clinical challenge. The precise qualitative assessment of PPLs or the estimation of malignancy probability in indeterminate nodules is critical for guiding patient management and optimizing clinical outcomes [24].
Conventional bronchoscopy often fails to reach these distal lesions, necessitating the development of advanced techniques such as navigational bronchoscopy. This technology has revolutionized the approach to PPLs by providing access to previously unreachable areas [25], offering a safer alternative to percutaneous lung puncture with reduced risks of complications, such as bleeding and pneumothorax [26]. Despite its high navigation success rate of 95%, the diagnostic yield remains suboptimal at around 70% [27], underscoring the need for improved diagnostic accuracy. While factors such as lesion size, character, and the presence of a bronchus sign correlate with positive diagnostic yield [28, 29], the spatial relationship between lesions and bronchi has been less explored. This study addresses this gap, demonstrating that LungPro-guided bronchoscopic biopsy is more likely to yield a positive result when PPLs are located beyond the fifth-generation bronchi and connected to three or more bronchi (Table 2). These spatial characteristics can serve as predictive indicators for LungPro’s diagnostic yield, aiding clinicians in selecting optimal diagnostic methods for PPLs. In addition, to further enhance LungPro’s diagnostic efficacy, combining radial endobronchial ultrasound (r-EBUS) [30] and C-arm fluoroscopy [31] can improve guidance to the target lesion, thereby increasing the positive biopsy rate. It is important to note, however, that even successful navigation and sampling can face diagnostic challenges due to insufficient sample size, poor tumor differentiation, and pathological difficulties stemming from intratumoral heterogeneity, which remain significant factors in failed bronchoscopic diagnosis of malignancy [32].
As the latest technology in pathological image analysis, pathomics has made several breakthroughs in enhancing the analysis of lung cancer pathological images, assisting pathologists in completing challenging diagnostic tasks, such as tumor identification, treatment efficacy detection, and tumor microenvironment analysis. For example, Nibid et al. [33] used deep learning-based pathomics to predict the response of stage III non-small cell lung cancer (NSCLC) to treatment, accurately classifying 8/12 responders and 10/11 non-responders. In addition, Wang X et al. [34] developed a system to predict the risk of recurrence in early stage NSCLC, with an accuracy ranging from 75 to 82%. Rakaee M et al. [35] constructed a machine learning-based tumor-infiltrating lymphocyte (TIL) scoring system to predict the response of NSCLC to immune checkpoint inhibitor therapy. These studies reveal that HE-stained slides contain rich and complex histological information. Applying deep learning techniques to histopathological images can help physicians uncover hidden patterns and features within the images, providing additional interpretative information for clinical practice.
In light of this, this study constructed and validated a pathomics diagnostic model based on HE samples obtained from LungPro navigational bronchoscopy, aiming to identify subtle malignant pathological features that are difficult for pathologists to detect with the naked eye, thereby improving the diagnostic efficacy of LungPro navigational bronchoscopy for malignant PPLs. The study utilized the classic convolutional neural network ResNet18 to extract pathological features from patches of whole slide images (WSIs) and employed weakly supervised learning for data training. The ResNet18 model achieved AUC values of 0.947 (95% CI 0.9465–0.9481) and 0.790 (95% CI 0.7872–0.7923) in the training and test sets, respectively. To address the issue of conflicting predictions from multiple pathological images of the same patient, the study introduced a fusion strategy based on multiple instance learning (MIL), which improved model performance and data efficiency through instance clustering and attention pooling. Subsequently, three machine learning algorithms were applied to the MIL-aggregated data set to train the optimal diagnostic model. ROC curve analysis revealed that the ExtraTrees algorithm performed best, with patient-level AUC values of 0.792 (0.680–0.903) and 0.777 (0.531–1.000) in the training and test sets, respectively. In addition, through univariate and multivariate logistic regression analysis, the study identified age, lesion boundary and mean CT attenuation as independent risk factors for predicting malignant pulmonary nodules. Finally, by integrating these clinical risk factors with pathological features, a comprehensive diagnostic model for pulmonary nodules was constructed. This combined model achieved AUC values of 0.909 (0.8115–1.0000) and 0.848 (0.6945–1.0000) in the training and test sets, respectively, demonstrating its significant potential in enhancing diagnostic performance.
Utilizing the integrated clinicopathomics diagnostic model, this study successfully identified 20 of 28 (71.43%) malignant lesions and 15 of 22 (68.18%) benign lesions within the LungPro navigation biopsy-negative cohort. Figure 12 demonstrates the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of the combined model across diagnostic thresholds. Critically, at the Youden index-derived cutoff of 0.647, the model achieved sensitivity, specificity, PPV, and NPV all exceeding 0.7 in the training cohort. Consequently, we selected this statistically optimized threshold as the primary cutoff value while emphasizing that clinical implementation may warrant context-specific threshold adjustments to maximize diagnostic utility. The clinical significance of this was further confirmed through contrasting the pathologist's ROI notes with the integrated Grad-CAM heatmaps of the model in Supplementary Fig. 2. As depicted in Fig. 2, in panels A–D, tuberculous lesions are depicted; yet, the pathologist initially misconstrued the normal tissue cells in panel D as cancerous, a misjudgment that was rectified to a benign diagnosis following this model's revision. In addition, panels O–U comprised lesions that were first identified as negative by LungPro, yet were eventually verified as malignant via surgical intervention or percutaneous lung biopsy. The model accurately identified these lesions as cancerous. After an in-depth examination, pathologists confirmed the malignancy of these lesions, marked by a significant nuclear-to-cytoplasmic ratio, nuclear distortion, nuclear membrane irregularity and conspicuous nucleoli. Remarkably, in the case of panel U, the patient received a post-surgery diagnosis of spindle cell carcinoma, but the pathologist initially deemed the biopsy sample benign, but the model accurately detected malignant symptoms. The integrated model developed in this study demonstrates dual clinical utilities: for pathologically confirmed positive cases, Grad-CAM visualization assists pathologists in localizing subtle pathological features frequently overlooked or misinterpreted in conventional microscopy, thereby providing evidence for diagnostic adjustment and enhancing characterization of poorly differentiated tumors, whereas for diagnostically negative cases, the model generates risk stratification scores that inform precision clinical decision-making, guiding personalized follow-up intervals to prevent loss to follow-up from false-negative results—ultimately avoiding tumor progression and ensuring timely therapeutic intervention [36]. Furthermore, the model demonstrates platform-agnostic transferability to diverse navigational bronchoscopy systems beyond LungPro—including electromagnetic navigation bronchoscopy (ENB) [37] and robotic bronchoscopy platforms [38]—thereby significantly expanding its clinical deployment flexibility and diagnostic universality.
Fig. 12.
Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of clinical (a training set; b testing set), Pathomics (c training set; d testing set), and combined prediction models (e: training set; f: testing set) at different cutoff values
Although the integrated pathomics diagnostic model developed in this study demonstrates significant potential for improving the diagnostic efficacy of LungPro navigational bronchoscopy, several limitations persist. First, the model was trained and validated using single-center data comprising pulmonary nodules with a mean diameter of 31.98 ± 12.85 mm. This constrained sample size and diversity may limit the model’s generalizability to other populations or healthcare settings, potentially introducing selection bias and spectrum bias. To mitigate these limitations, we implemented a random allocation strategy for the training and validation data sets (Table 3), with all baseline variables showing P values > 0.05 indicating balanced distribution. In addition, fivefold cross-validation was employed during model training to reduce the risk of overfitting to specific data subsets. Second, the model relies on high-quality pathological and imaging data, but in real-world clinical practice, image quality may be affected by factors, such as sampling techniques, staining variations, or imaging equipment, which could compromise its generalizability across diverse clinical settings. In addition, the decision-making mechanism of the model in distinguishing between benign and malignant peripheral pulmonary lesions is not fully transparent, and its internal feature extraction and classification logic exhibit a degree of unexplainability, which may lead to missed diagnoses or misclassifications in some cases. Finally, although the model demonstrates good classification performance in the diagnostic-negative group, its reliability and stability need further validation through larger scale, multi-center studies to assess its applicability across different clinical scenarios.
Future improvements include Upcoming studies ought to focus on: broadening data sets from various institutions and locations to improve the applicability of models, supported by forward-looking cohort validation research; establishing uniform data collection systems for uniformity in quality; amalgamating AI techniques for better model clarity; and creating diverse diagnostic frameworks that integrate histopathological imagery, radiological information, clinical factors, and molecular markers. Essential subsequent actions involve: implementing direct comparisons with existing diagnostic processes and measuring the benefits in clinical utility using measures, such as the yield of diagnoses and the duration until diagnosis.
Methods
Study design
In this study, we developed a systematic methodology for predictive modeling in medical research using pathological data. The process begins with segmenting WSIs into smaller, more manageable patches, which are then analyzed using CNNs [39, 40] for patch recognition. Subsequently, a multi-instance learning framework [41] is employed to aggregate the extracted features into a comprehensive data set. Following the pathological analysis, we utilized both univariable and multivariable logistic regression to identify significant clinical risk factors associated with malignant peripheral pulmonary lesions. Finally, machine learning algorithms are applied to develop predictive models that assess the likelihood of benign or malignant outcomes based on the integrated pathological features and clinical factors. This comprehensive approach provides a robust framework for medical image analysis and predictive modeling. The workflow of this study is illustrated in Figure 1.
Patient and data source
This single-center retrospective study was conducted at the Second Affiliated Hospital of Chongqing Medical University. The initial cohort included 218 patients (225 PPLs) who underwent LungPro-guided navigational bronchoscopy with pathological sampling (lavage, brushing, and/or forceps biopsy) for peripheral pulmonary lesions (PPLs) between January 2022 and December 2023. Exclusion criteria were as follows: (1) metastatic or non-primary lung malignancies; (2) lesions are localized in the third-order bronchi or above; (3) loss to follow-up or stable lesion size (> 3-month observation without significant change); and (4) inadequate pathological specimens for histopathological analysis.
After screening, 144 eligible PPL patients were enrolled (Fig. 2). Clinical data, navigation records, and hematoxylin–eosin (HE)-stained slides were collected. Specimens were stratified into two groups: diagnosis-positive group (confirmed malignancies/specific benign pathologies via cytopathology/NGS/microbiology) and diagnosis-negative group (non-diagnostic initial results but confirmed through ≥ 3-month CT follow-up or invasive verification).
Prior to model training, the diagnosis-positive cohort was randomly partitioned into training and test sets at a 7:3 ratio. To maintain patient-level data integrity, all samples from the same patient were consistently allocated to the same subset. This partitioning process was iteratively repeated until inter-set comparisons of all clinical–pathological features demonstrated no significant differences (P > 0.05, two-sample t test), thereby ensuring statistically unbiased cohort division. The diagnosis-negative cohort served as an independent external test set to evaluate model generalizability. This retrospective study was approved by the Ethics Committee of the Second Affiliated Hospital of Chongqing Medical University (Approval No. 65) with waived informed consent.
Pathology procedures
Image segmentation and preprocessing methods
Our whole slide image processing pipeline addressed computational challenges through three key steps. First, 20 × WSIs were tiled into 512 × 512 pixel patches (0.5um/pixel resolution) with white background exclusion. Second, we standardized staining variations using Macenko normalization. Third, a weakly supervised framework [42, 43] assigned patient-level diagnostic labels to all tissue patches, eliminating manual annotation while preserving clinical context and data integrity.
Patch-level prediction
In our deep learning framework, we employed a dual strategy for prediction: patch-level prediction and the fusion of whole slide image (WSI) features through multi-instance learning. For training, each patch received a binary label indicative of the patient's diagnostic status, which was used as the training label. Consequently, all patches pertaining to a specific patient were consistently labeled.
Model training With a focus on patch-level prediction, we explored several prominent networks, such as ResNet50, ResNet18, and DenseNet121, to enhance the performance of traditional CNN-based models. We employed a prominent network-ResNet18. Comparative analyses were conducted to identify the most suitable model for our specific needs. Further training details are documented in Supplementary 1A.
Multi-instance learning fusion
Multi-instance learning-based feature fusion Following the training phase, we entered the prediction stage, where each patch was assigned labels and associated probabilities. These patch-level probabilities were aggregated using multi-instance learning approaches to synthesize features at the whole slide image (WSI) level. We implemented two distinct methodologies for this integration, as detailed in Supplementary 1B:
Patch likelihood histogram (PLH) pipeline This method employs a histogram to depict the distribution of patch probabilities across a WSI, providing a comprehensive representation that captures the full spectrum of likelihoods within the slide.
Bag-of-words (BoW) pipeline This approach combines histogram-based and vocabulary-based techniques, using term frequency–inverse document frequency (TF–IDF) mapping for each patch. The resultant TF–IDF feature vectors effectively characterize the WSI's features.
Feature selection From the multi-instance learning process, a total of 206 features were aggregated, including 11 probability features and 2 predictive label features from various processes. To optimize this feature set, we applied Lasso regression for feature selection, focusing on reducing redundancy and enhancing model interpretability. This refinement led to the selection of key features that were then used to build predictive models using machine learning algorithms, such as logistic regression, RandomForest, and ExtraTrees.
Model building
Pathology model Utilizing combined patch-level forecasts, probability histograms, and TF–IDF attributes to develop diverse patient profiles.
Clinical model Concurrently with the Pathology Model, a distinct machine learning model was created to examine clinical characteristics for forecasting diagnoses.
Combined model Chosen clinical features of statistical significance (P < 0.05) identified via univariate/multivariate evaluations. The chosen characteristics were subsequently amalgamated into the pathology model, forming a unified model. For better clinical effectiveness, the combined model was visualized through a nomogram, enhancing its clarity and easing its application in clinical environments.
Metrics To assess the efficacy of the developed predictive models, we employed ROC (receiver operating characteristic) curves, DCA (decision curve analysis) curves, and calibration curves.
Statistical analysis
To evaluate the distribution of clinical features across the cohorts, we first assessed normality using the Shapiro–Wilk test. Based on the normality test results, appropriate statistical tests [independent t tests, Mann–Whitney U tests, or chi-squared () tests] were selected to compare baseline characteristics between training and validation sets. The detailed baseline characteristics of the patient cohorts are summarized in Table 1. To identify significant clinical predictors of malignant pulmonary nodules, we performed univariable and multivariable logistic regression analyses. For pathological analysis and machine learning model development, we employed Python 3.7.12, utilizing the following libraries: scikit-learn (v1.0.2) for machine learning algorithms; Slideflow (v2.1.0) for pathology image analysis; Pandas (v1.2.4) for data manipulation; NumPy (v1.20.2) for numerical operations; PyTorch (v1.8.0) for deep learning; OpenSlide (v1.2.0) for whole-slide image processing; and SciPy (v1.7.3) for scientific computing.
Supplementary Information
Author contributions
F.Y. designed the study and drafted the manuscript; Y.B. and X.M. performed clinical data curation and validation; Y.T. conducted systematic analysis of histopathological slides and interpreted the Gradient-weighted Class Activation Mapping (Grad-CAM) visualizations derived from the deep learning model; S. L. supervised the study and critically revised the manuscript for intellectual content.All authors have reviewed and approved the manuscript.
Funding
This work was supported by the Chongqing Municipal Natural Science Foundation (CSTB2023NSCQ–MSX0454) and the Clinical Technology Innovation Program of the Second Affiliated Hospital of Chongqing Medical University.
Data availability
No datasets were generated or analysed during the current study.
Declarations
Ethics approval and consent to participate
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee. We have obtained the ethical approval for data from patients with peripheral pulmonary lesions undergoing LungPro navigation from the Second Affiliated Hospital of Chongqing Medical University, Chongqing, China (Approval No. 65).
Informed consent
Informed consent was obtained from all individual participants included in the study.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Sung H, Ferlay J, Siegel RL, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–49. [DOI] [PubMed] [Google Scholar]
- 2.Zheng X, Xie F, Li Y, et al. Ultrathin bronchoscope combined with virtual bronchoscopic navigation and endobronchial ultrasound for the diagnosis of peripheral pulmonary lesions with or without fluoroscopy: a randomized trial. Thorac Cancer. 2021;12(12):1864–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Kalchiem-Dekel O, Connolly JG, Lin IH, et al. Shape-sensing robotic-assisted bronchoscopy in the diagnosis of pulmonary parenchymal lesions. Chest. 2022;161(2):572–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Zhihan Z, Junbao Z, Xi C, et al. Comparison of efficacy and safety of different guided technologies combined with ultrathin bronchoscopic biopsy for peripheral pulmonary lesions. Clin Respir J. 2024;18(10):e70012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Experts consensus on transbronchial diagnosis, localization and treatment of peripheral pulmonary nodules guided by the augmented reality optical lung navigation. Zhonghua Yi Xue Za Zhi. 2024; 104(16): 1371–1380. [DOI] [PubMed]
- 6.Felix Jf H, Ralf E, Maren S. Bronchoscopy in lung cancer: navigational modalities and their clinical use. Expert Rev Respir Med. 2016;10(8):901–6. [DOI] [PubMed] [Google Scholar]
- 7.Mori K, Deguchi D, Sugiyama J, et al. Tracking of a bronchoscope using epipolar geometry analysis and intensity-based image registration of real and virtual endoscopic images. Med Image Anal. 2002;6(3):321–36. [DOI] [PubMed] [Google Scholar]
- 8.Chen A, Pastis N, Furukawa B, et al. The effect of respiratory motion on pulmonary nodule location during electromagnetic navigation bronchoscopy. Chest. 2015;147(5):1275–81. [DOI] [PubMed] [Google Scholar]
- 9.Liam CK, Lee P, Yu CJ, et al. The diagnosis of lung cancer in the era of interventional pulmonology. Int J Tuberc Lung Dis. 2021;25(1):6–15. [DOI] [PubMed] [Google Scholar]
- 10.Nizzoli R, Tiseo M, Gelsomino F, et al. Accuracy of fine needle aspiration cytology in the pathological typing of non-small cell lung cancer. J Thorac Oncol. 2011;6(3):489–93. [DOI] [PubMed] [Google Scholar]
- 11.Caroline LC, Louise JS, Salmah B, et al. Quantitative analysis of tumor in bronchial biopsy specimens. J Thorac Oncol. 2010;5(4):448–52. [DOI] [PubMed] [Google Scholar]
- 12.Rikiya Y, Mizuho N, Richard Kinh Gian D, et al. Convolutional neural networks: an overview and application in radiology. Insights Imaging. 2018;9(4):611–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Ayyad SM, Shehata M, Shalaby A, et al. Role of AI and histopathological images in detecting prostate cancer: a survey. Sensors. 2021;21(8):2586. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Huang Y, Yao Z, Li L, et al. Deep learning radiopathomics based on preoperative US images and biopsy whole slide images can distinguish between luminal and non-luminal tumors in early-stage breast cancers. EBioMedicine. 2023;94:104706. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Krishna S, Abdelbaki A, Hart PA, et al. Endoscopic ultrasound-guided needle-based confocal endomicroscopy as a diagnostic imaging biomarker for intraductal papillary mucinous neoplasms. Cancers. 2024;16(6):1238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Xiyue W, Junhan Z, Eliana M, et al. A pathology foundation model for cancer diagnosis and prognosis prediction. Nature. 2024;634(8035):970–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Selvaraju RR, Cogswell M, Das A, et al. Grad-cam: visual explanations from deep networks via gradient-based localization. Int J Comput Vis. 2020;128(2):336–59. [Google Scholar]
- 18.Canfora I, Cutaia G, Marcianò M, et al. A predictive system to classify preoperative grading of rectal cancer using radiomics features. Lect Notes Comput Sci. 2022;13373:431–40. [Google Scholar]
- 19.Alvarez-Jimenez C, Sandino AA, Prasanna P, et al. Identifying cross-scale associations between radiomic and pathomic signatures of non-small cell lung cancer subtypes: preliminary results. Cancers. 2020;12(12):3663. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Nicolas C, Paolo Santiago O, Theodore S, et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat Med. 2018;24(10):1559–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Ninomiya H, Hiramatsu M, Inamura K, et al. Correlation between morphology and EGFR mutations in lung adenocarcinomas Significance of the micropapillary pattern and the hobnail cell type. Lung Cancer. 2008;63(2):235–40. [DOI] [PubMed] [Google Scholar]
- 22.Seref S, Ramazan T, Emrah C, et al. A novel brain tumor magnetic resonance imaging dataset (Gazi Brains 2020): initial benchmark results and comprehensive analysis. PeerJ Comput Sci. 2025;11:e2920. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Yusuf AM, Bee EK, Mohd SMA, et al. Decoding the black box: explainable AI (XAI) for cancer diagnosis, prognosis, and treatment planning-A state-of-the art systematic review. Int J Med Inform. 2024;193:105689. [DOI] [PubMed] [Google Scholar]
- 24.Haiquan C, Anthony WK, Michael H, et al. The 2023 American association for thoracic surgery (AATS) expert consensus document: management of subsolid lung nodules. J Thorac Cardiovasc Surg. 2024;168(3):631–47. [DOI] [PubMed] [Google Scholar]
- 25.Nadig TR, Thomas N, Nietert PJ, et al. Guided bronchoscopy for the evaluation of pulmonary lesions: an updated meta-analysis. Chest. 2023;163(6):1589–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Vachani A, Zhou M, Ghosh S, et al. Complications after transthoracic needle biopsy of pulmonary nodules: a population-level retrospective cohort analysis. J Am Coll Radiol: JACR. 2022;19(10):1121–9. [DOI] [PubMed] [Google Scholar]
- 27.Wang MJ, Nietert PJ, Silvestri GA. Meta-analysis of guided bronchoscopy for the evaluation of the pulmonary nodule. Chest. 2012;142(2):385–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Kitamura A, Tomishima Y, Imai R, et al. Findings of virtual bronchoscopic navigation can predict the diagnostic rate of primary lung cancer by bronchoscopy in patients with peripheral lung lesions. BMC Pulm Med. 2022;22(1):270. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Hiddinga BI, Slebos DJ, David KT, et al. The additional diagnostic value of virtual bronchoscopy navigation in patients with pulmonary nodules—The NAVIGATOR study. Lung Cancer. 2023;177:37–43. [DOI] [PubMed] [Google Scholar]
- 30.Shuhong G, Xiaoli X, Xiaoqin Z, et al. Diagnostic value of rEBUS-TBLB combined distance measurement method based on ultrasound images in bronchoscopy for peripheral lung lesions. SLAS Technol. 2024;29(6):100198. [DOI] [PubMed] [Google Scholar]
- 31.Chen J, Xie F, Zheng X, et al. Mobile 3-dimensional (3D) C-arm system-assisted transbronchial biopsy and ablation for ground-glass opacity pulmonary nodules: a case report. Translat Lung Cancer Res. 2021;10(7):3312–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Nishii Y, Yasuma T, Ito K, et al. Factors leading to failure to diagnose pulmonary malignant tumors using endobronchial ultrasound with guide sheath within the target lesion. Respir Res. 2019;20(1):207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Nibid L, Greco C, Cordelli E, et al. Deep pathomics: a new image-based tool for predicting response to treatment in stage III non-small cell lung cancer. PLoS ONE. 2023;18(11):e0294259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Wang X, Janowczyk A, Zhou Y, et al. Prediction of recurrence in early stage non-small cell lung cancer using computer extracted nuclear features from digital H&E images. Sci Rep. 2017;7(1):13543. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Rakaee M, Adib E, Ricciuti B, et al. Association of machine learning-based assessment of tumor-infiltrating lymphocytes on standard histologic images with outcomes of immunotherapy in patients with NSCLC. JAMA Oncol. 2023;9(1):51–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Yu W, Ma H, Yu G, et al. Non-diagnostic electromagnetic navigation bronchoscopy biopsy: predictive factors and final diagnoses. Oncol Lett. 2023;25(4):166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Mehta AC, Hood KL, Schwarz Y, et al. The evolutional history of electromagnetic navigation bronchoscopy. Chest. 2018;154(4):935–47. [DOI] [PubMed] [Google Scholar]
- 38.Chaddha U, Kovacs SP, Manley C, et al. Robot-assisted bronchoscopy for pulmonary lesion diagnosis: results from the initial multicenter experience. BMC Pulm Med. 2019;19(1):243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Zewen L, Fan L, Wenjie Y, et al. A survey of convolutional neural networks: analysis, applications, and prospects. IEEE Trans Neural Netw Learn Syst. 2021;33(12):107034. [DOI] [PubMed] [Google Scholar]
- 40.Muhammad A, Viviana B, Ghazal B, et al. Applications of artificial intelligence, deep learning, and machine learning to support the analysis of microscopic images of cells and tissues. J Imaging. 2025;11(2):59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Jikai Y, Hongda C, Lianxin H, et al. Exploring multi-instance learning in whole slide imaging: current and future perspectives. Pathol Res Pract. 2025;271:156006. [DOI] [PubMed] [Google Scholar]
- 42.Samuel B, Emma CR, Bernhard K. A survey on active learning and human-in-the-loop deep learning for medical image analysis. Med Image Anal. 2021;71(0):102062. [DOI] [PubMed] [Google Scholar]
- 43.Mahnaz M, Jessica C, Ognjen A, et al. Weakly supervised learning and interpretability for endometrial whole slide image diagnosis. Exp Biol Med. 2022;247(22):2025–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
No datasets were generated or analysed during the current study.











