. Author manuscript; available in PMC: 2025 Jun 3.

Published in final edited form as: Cancer Biomark. 2024 Feb 6;42(1):CBM230360. doi: 10.3233/CBM-230360

Table 1.

Selected studies on radiomics-based risk stratification of pulmonary nodules

Publication	Study design and objective	Populations or datasets	Model and analytical details	Key results

Way et al, 2006 [51]	Analytical validation study to develop a CAD model and assess performance of image segmentation.	Training data: 96 PNs (4–60 mm; 46% malignant) from 58 pts at the University of Michigan. Validation data for segmentation: experienced radiologists’ segmentation of 23 PNs from LIDC.	3D active contour segmentation with manual feature extraction, selection, and classification. CAD model trained and tested using leave-one-case-out resampling scheme.	AUC = 0.83. Model-segmented PN volumes greater than those outlined by LIDC radiologists.
Way et al, 2009 [52]	Analytical validation study to refine above CAD model.	Training data: 256 PNs (3–38 mm; 48% malignant) from 152 pts at the University of Michigan.	Novel PN surface features characterizing smoothness and shape irregularity added to CAD model (described above). Demographics (age, gender) and LDA classifier also assessed.	AUC = 0.86 with addition of novel PN surface features. No significant difference in CAD model performance when demographic features or LDA classifier included.
Way et al, 2010 [53]	Retrospective multi-reader, multicase study to assess effect of above CAD model on radiologists’ performance discriminating between malignant and benign PNs.	Reader study: 6 fellowship-trained thoracic radiologists evaluated 256 PNs (3–38 mm; 48% malignant) from 152 pts at the University of Michigan.	CAD model (described above). Model output = relative malignancy rating on a scale of 1 to 10, representing a 10-bin histogram of scores with fitted Gaussian distributions for malignant benign PNs.	CAD AUC = 0.86. Average radiologists' AUC increased from 0.83 to 0.85 with CAD.
Huang et al, 2018 [54]	Analytical validation study using matched case-control data to derive and evaluate a novel CAD model.	Training data: 140 PNs (4–20 mm; 50% malignant) from 140 pts in the NLST. Validation data: 46 PNs (4–20 mm; 43% malignant) from 46 pts in the NLST. All pts underwent lung biopsy. Malignant and benign PNs were matched based on demographic, clinical, and PN variables.	Image processing and feature extraction performed by expert radiologists. Random forest machine learning algorithm used to select variables and develop CAD model.	Validation cohort: CAD AUC = 0.92. CAD: Sn = 0.95, Sp = 0.88, PPV = 0.86, NPV = 0.96. Three radiologists' combined reading: Sn = 0.70, Sp = 0.69, PPV = 0.64, NPV = 0.75.
Peikert et al, 2018 [55]	Analytical validation study to develop and internally validate a radiomics-based multivariable model (BRODERS model).	Training data: 726 PNs (7–30 mm; 56% malignant) from 726 pts in the NLST.	PNs segmented manually using ANALYZE software (Mayo Clinic Biomedical Imaging Resource) and radiomic features extracted. LASSO multivariable analysis used to develop final model.	Optimism-corrected AUC for final 8- variable BRODERS model = 0.94.
Maldonado et al, 2020 [56]	Analytical validation study to externally validate BRODERS model.	External validation data: 170 PNs (7–30 mm; 54% malignant) from 170 consecutive pts with incidentally detected PNs at Vanderbilt University.	BRODERS model (described above) compared to Brock model.	BRODERS AUC = 0.90; Brock AUC = 0.87.
Balagurunathan et al, 2019 [57]	Analytical validation study using a 2:1 nested case-control study design to develop a novel radiomics model.	Training data: 244 PNs (> 4 mm; 32% malignant) from 244 pts in the NLST. Validation data: 235 PNs (> 4 mm; 37% malignant) from 235 pts in the NLST. Malignant and benign PNs were matched based on demographic and clinical variables.	PNs 3D segmented by radiologists via semi-automated algorithm, 219 quantitative features extracted, and an optimal linear classifier model was used.	In both training (0.85 vs 0.80) and validation (0.88 vs 0.86) datasets, AUC was higher for best texture feature set compared to size and shape feature set. Addition of clinical data did not significantly improve AUC.
Ardila et al, 2019 [58]	Analytical validation study and retrospective reader study to develop and externally validate a novel radiomics-based AI CAD model.	Training data: 29,541 PNs (4% malignant) from NLST. Tuning data: ~6.343 PNs (5% malignant) from NLST. Validation data: 6,716 PNs (4% malignant) from NLST. Reader study: 6 board-certified radiologists evaluated 507 CTs with PNs (16% malignant; subset of validation data).	CAD approach developed using the Ten-sorFlow platform (Google Inc.) and employed a 3D CNN model that performs end-to-end analysis of whole-CT volumes. Model output = LUMAS, roughly meant to correspond to Lung-RADS 3, 4A, and 4B/4X.	Validation cohort: AI CAD AUC = 0.94. AI CAD outperformed radiologists within each LUMAS bucket in reader study when either 1 CT scan was used per pt or when multiple scans were available per pt.
Venkadesh et al, 2021 [59]	Analytical validation study and retrospective reader study to develop and externally validate a novel radiomics-based AI CAD model.	Training data: 16,077 PNs (> 4 mm; 8% malignant) from NLST. Validation data: 883 PNs in full cohort (7% malignant); 175 non-size-matched PNs in subset A (34% malignant); 177 size-matched PNs in subset B (33% malignant) from the DLCST. Reader study: 11 clinicians (9 radiologists, 2 pulmonologists) evaluated PNs in cancer-enriched cohorts.	2D CNN with ResNet50 backbone and 3D CNN based on Inception-vl architecture used to develop AI CAD algorithm. Internally validated using 10-fold cross validation. AI CAD model compared to Brock model and clinicians. Model output = risk score from 0 to 1.	Full validation cohort: AI CAD AUC = 0.93; Brock AUC = 0.90. Subset A cohort: AI CAD AUC = 0.96; average clinician AUC = 0.90; Brock AUC = 0.94. Subset B cohort: AI CAD AUC = 0.86; average clinician AUC = 0.82; Brock AUC = 0.75.
Massion et al, 2020 [60]	Analytical validation study to develop and externally validate a novel radiomics-based AI CAD model (Optellum LCP-CNN).	Training data: > 130,000 PNs (~50% malignant) from NLST Internal validation data: 15,693 PNs (> 6 mm; 6% malignant) from 6,547 pts in the NLST. External validation data: 116 PNs (5–30 mm; 55% malignant) from 116 pts with incidentally detected PNs at Vanderbilt University; 463 PNs (5–19 mm; 14% malignant) from 427 pts with incidentally detected PNs at Oxford University	2.5D CNN with DenseNet architecture with 5 dense blocks and PyTorch framework for machine learning. Internally validated using 8-fold cross validation. Model output = score between 0% and 100% to represent likelihood of malignancy. Compared to Brock and Mayo Clinic models.	Internal validation cohort: LCP-CNN AUC = 0.92; Brock AUC = 0.86; Mayo Clinic AUC = 0.85. Vanderbilt University external validation cohort: LCP-CNN AUC = 0.84; Mayo Clinic AUC = 0.78. Oxford University external validation cohort: LCP-CNN AUC = 0.92; Mayo Clinic AUC = 0.82.
Baldwin et al, 2020 [61]	Analytical validation study to externally validate the Optellum LCP-CNN model.	External validation data: 1,397 PNs (5–15 mm: 17% malignant) from 1,187 U.K. pts in IDEAL study.	Optellum LCP-CNN model (described above) compared to Brock model.	LCP-CNN AUC = 0.87; Brock AUC = 0.83.
Kim et al, 2022 [62]	Retrospective multi-reader, multicase study to assess the effect of Optellum AI CAD model on clinicians’ performance discriminating between malignant and benign PNs.	Reader study: 12 clinicians (6 radiologists, 6 pulmonologists) evaluated 300 CTs with PNs (5–30 mm; 50% malignant) from 300 pts from 7 sources in the U.S., U.K., and NLST.	Optellum LCP-CNN model (described above). Model output = LCP score 1 to 10, categorizing malignancy risk on a decile scale for a population with 30% cancer prevalence.	Average clinicians’ AUC increased from 0.82 to 0.89 with AI CAD. Interobserver agreement (Fleiss Kappa) improved with AI CAD for < 5% risk (0.71 vs 0.50) and > 65% risk (0.71 vs 0.54) categories and PN management decisions (0.52 vs 0.44).
Kim et al, 2023 [63]	Secondary analysis of above retrospective multi-reader, multi-case study to assess the effect of Optellum AI CAD model on clinicians’ management of PNs.	Reader study: described above.	LCP score (described above). Appropriate PN management defined as surgery, biopsy, or immediate imaging for malignant PNs and imaging follow-up for benign PNs.	Average clinicians’ risk estimate without vs with AI CAD: 60% vs 69% (malignant PNs); 23% vs 21% (benign PNs). Average clinicians’ appropriate PN management without vs with AI CAD: 80% vs 84% (overall); 72% vs 81% (malignant PNs); 87% vs 89% (benign PNs).

Abbreviations: CAD = computer-aided diagnosis; PN = pulmonary nodule; pts = patients; LIDC = Lung Image Database Consortium; LDA = linear discriminant analysis; NLST = National Lung Screening Trial; Sn = sensitivity; Sp = specificity; PPV = positive predictive value; NPV = negative predictive value; BRODERS = Benign Versus Aggressive Nodule Evaluation Using Radiomics Stratification; LASSO = least absolute shrinkage and selection operator; AI = artificial intelligence; CNN = convolutional neural network; CT = computed tomography; LUMAS = lung malignancy score; Lung-RADS = Lung Imaging reporting and Data System; DLCST = Danish Lung Cancer Screening Trial; LCP-CNN = Lung Cancer Prediction Convolutional Neural Network; U.K. = United Kingdom; IDEAL = Artificial Intelligence and Big Data for Early Lung Cancer Diagnosis; U.S. = United States.