Abstract.
Purpose
We aim to evaluate the performance of radiomic biopsy (RB), best-fit bounding box (BB), and a deep-learning-based segmentation method called no-new-U-Net (nnU-Net), compared to the standard full manual (FM) segmentation method for predicting benign and malignant lung nodules using a computed tomography (CT) radiomic machine learning model.
Materials and Methods
A total of 188 CT scans of lung nodules from 2 institutions were used for our study. One radiologist identified and delineated all 188 lung nodules, whereas a second radiologist segmented a subset () of these nodules. Both radiologists employed FM and RB segmentation methods. BB segmentations were generated computationally from the FM segmentations. The nnU-Net, a deep-learning-based segmentation method, performed automatic nodule detection and segmentation. The time radiologists took to perform segmentations was recorded. Radiomic features were extracted from each segmentation method, and models to predict benign and malignant lung nodules were developed. The Kruskal–Wallis and DeLong tests were used to compare segmentation times and areas under the curve (AUC), respectively.
Results
For the delineation of the FM, RB, and BB segmentations, the two radiologists required a median time (IQR) of 113 (54 to 251.5), 21 (9.25 to 38), and 16 (12 to 64.25) s, respectively (). In dataset 1, the mean AUC (95% CI) of the FM, RB, BB, and nnU-Net model were 0.964 (0.96 to 0.968), 0.985 (0.983 to 0.987), 0.961 (0.956 to 0.965), and 0.878 (0.869 to 0.888). In dataset 2, the mean AUC (95% CI) of the FM, RB, BB, and nnU-Net model were 0.717 (0.705 to 0.729), 0.919 (0.913 to 0.924), 0.699 (0.687 to 0.711), and 0.644 (0.632 to 0.657).
Conclusion
Radiomic biopsy-based models outperformed FM and BB models in prediction of benign and malignant lung nodules in two independent datasets while deep-learning segmentation-based models performed similarly to FM and BB. RB could be a more efficient segmentation method, but further validation is needed.
Keywords: segmentation, lung nodules, computed tomography imaging, radiomics, machine learning, deep learning
1. Introduction
Medical imaging is essential in cancer surveillance, diagnosis, treatment, and prediction of prognosis. In recent years, computed tomography (CT) radiomic models have emerged as valuable tools in cancer research for diagnosing, monitoring treatment effects, and offering decision support, ultimately enabling personalized medicine.1–8 However, radiomic models necessitate large-scale, multi-center datasets of images with segmented 3D volumes of interest (VOIs). These VOIs define the region in which radiomic features are computed. Although VOIs can be delineated manually, semi-automatically, or automatically,9–13 full manual (FM) segmentations by board-certified radiologists remain the standard for radiomic models. FM segmentations, however, are both time-consuming14 and labor-intensive for radiologists in clinical settings. Consequently, most published radiomic models are trained, validated, and tested on single-center data with small sample sizes.15 There is a need for a simple, efficient segmentation method that does not compromise the effectiveness of radiomic models.
This study compares radiomic biopsy (RB), best-fit bounding box (BB), and no-new-U-Net (nnU-Net) segmentations with the standard FM segmentations in a CT radiomic machine learning model for predicting benign and malignant lung nodules. We aim to assess whether RB, BB, and/or nnU-Net segmentation methods can be viable and time-efficient alternatives to FM segmentations.
2. Materials and Methods
This retrospective study was approved by the Institutional Review Board with a waiver of informed consent. Radiomic features were extracted from the FM, RB, BB, and nnU-Net segmentations. For each segmentation method, the performance of the CT radiomic model to predict benign and malignant lung nodules was evaluated.
2.1. Data Collection
We collected clinical data and CT scans for subjects exhibiting solid benign or malignant lung nodules between December 2015 and 2018 from two separate institutions. Dataset 1 consisted of CT scans acquired from institution 1, whereas dataset 2 comprised CT scans obtained from institution 2.
The inclusion criteria were (1) the presence of a solid lung nodule and (2) a pathologically confirmed diagnosis of benign or malignant. The exclusion criteria were (1) no chest CT scan before diagnosis, (2) a low-dose CT scan defined as a dose length product , (3) no soft-tissue kernel image reconstruction, (4) thick-slice contiguous axial images , (5) lung nodules indistinct from atelectasis, mediastinum, or consolidation, (6) lung nodules in largest diameter, (7) lung nodules predominantly ground class or cystic, or (8) metastatic cancer.
The detailed subject inclusion and exclusion flowcharts for datasets 1 and 2 are shown in Fig. 1. A prior study reported the 102 CT images and FM segmentations in dataset 1 included in this study.16 The prior study evaluated the performance of a CT radiomic model to detect small cell lung cancer (SCLC). This study examines the performance of various segmentation methods and their corresponding radiomic models in predicting benign and malignant lung nodules.
Fig. 1.
Subject inclusion and exclusion flowchart for benign and malignant lung nodules in (a) dataset 1 and (b) dataset 2.
2.2. Segmentation Methods
The FM, RB, BB, and nnU-Net segmentation methods are characterized as follows: FM represents the 3D manual delineation of a lung nodule; RB denotes the maximum VOI sphere within a lung nodule, where the center is visually selected; BB refers to the smallest cuboid VOI surrounding a lung nodule; and nnU-Net is a deep-learning-based segmentation method that autonomously adjusts its pre-processing, network architecture, training, and post-processing for any new segmentation task.17 A board-certified radiologist manually delineated the FM and RB segmentations, the BB segmentations were computationally derived from the FM segmentations, and the nnU-Net segmentations were defined by a 3D nnU-Net ensemble, consisting of a (1) full-resolution 3D nnU-Net and (2) a cascade where first a 3D nnU-Net operates on low-resolution images, and then a second full-resolution 3D nnU-Net refined the predictions of the former. The FM was considered the reference standard for the RB, BB, and nnU-Net segmentation methods.
For the FM segmentations, the following guidelines were used: (1) solid regions of the lung nodule were included, (2) cystic regions of the lung nodule were excluded, (3) lobar blood vessels were excluded on contrast-enhanced (CE) CT scans, (4) dense ground glass or nodularity contiguous with the nodule as visualized on the lung windows were included, and (5) only a single lung nodule was segmented for each subject based on the nodule being surveilled or used for pathological tissue diagnosis.
For the RB segmentations, the sphere had to (1) stay entirely within the boundary of the solid lung nodule boundaries without extending to any adjacent ground glass, surrounding tissue, or normal lung parenchyma, (2) be the largest sphere that could remain completely inside the nodule, (3) have a maximum diameter of 40 pixels along the axial plane (due to limitations in ITK-SNAP v3.8.018), and (4) exclude any lobar blood vessels on CE CT scans. The BB segmentation was computationally derived by encompassing the smallest VOI while still including the entire FM segmentation of the lung nodule.
We trained the 3D nnU-Net on 1108 CT studies and their segmentations from three publicly available lung CT datasets on The Cancer Imaging Archive18: (1) The Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) (),19 (2) Non-small cell lung cancer (NSCLC)-Radiogenomics (),20,21 and (3) NSCLC-Radiomics-Genomics ().2,22 We applied all 3D nnU-Net configurations, including 3D-lowres, 3D-fullres, and 3D-cascade-fullres, in a five-fold cross-validation over the training set. We then had nnU-Net determine the best combination, which consisted of an ensemble of 3D-fullres and 3D-cascade-fullres, followed by image post-processing.
2.3. Segmentation Time
We recorded the time in seconds for two radiologists to delineate an FM, RB, and BB segmentation of a benign, adenocarcinoma (ADC), and SCLC lung nodule in ITK-SNAP.23 The first radiologist (S.B.M.) had 7 years of experience and had used ITK-SNAP before, whereas the second radiologist (C.P.) had 9 years of experience and had never used ITK-SNAP. Both radiologists performed one practice segmentation to become familiar with ITK-SNAP. The radiologists performed the FM segmentation last to provide the fastest segmentation time possible, as the lung nodule was reviewed for both the RB and BB segmentations. The radiologists carried out the segmentation methods at separate instances on the same computer, and their time was recorded by a third radiologist (R.P.S.).
2.4. Radiomic Features
We extracted image biomarker standardization initiative (IBSI)24 compliant radiomic features from the FM, RB, BB, and nnU-Net segmentations of the lung nodules on the CT images within the lung window (: 1600, : ) using Pyradiomics v3.0.1.25 We used principal component analysis (PCA) to assess whether the variation in radiomic features could be attributed to batch effects. ComBat harmonization was then used to rectify the variation in radiomic features caused by these batch effects.26 Shape features derived from RB and BB segmentations were omitted.
2.5. Inter-rater Reliability
Acquiring dependable lung nodule segmentations is a complex task, as it is influenced by both intra- and inter-rater variations among expert radiologists, and radiomic features are susceptible to these inconsistencies.27 To simulate the segmentations of multiple readers, we performed erosions and dilations on the FM, RB, and BB segmentations of a subset of benign () and malignant () lung nodules in dataset 1.
In addition, a second radiologist created FM and RB segmentations for the same subset () of lung nodules in dataset 1. The inter-reader consistency between the two radiologists, as well as the erosions and dilations, were assessed using the intra-class correlation coefficient (ICC). We selected radiomic features that had an ICC of at least 0.8, which is indicative of good reliability.28 We subsequently excluded the subset of 20 lung nodules used to assess inter-rater reliability from dataset 1 to reduce the risk of data leakage.
2.6. Segmentation Agreement
We used the Dice similarity coefficient (DSC) to evaluate the agreement between nnU-Net and FM segmentations. The DSC is a metric used to evaluate the performance of image segmentation algorithms.29,30 The DSC score ranges from 0 to 1, where 0 indicates no overlap between the segmented regions, and 1 indicates a perfect overlap or identical segmentation results. To compare CT radiomic models based on FM, RB, BB, and nnU-Net segmentation methods, we excluded lung nodules from datasets 1 and 2 that had no agreement (DSC = 0) between nnU-Net and FM segmentations to gauge the best possible performance of the nnU-Net model.
2.7. Correlated Radiomic Features
Radiomic features are recognized to have high correlations, rendering some redundant.31 We used the pairwise Spearman’s correlation coefficient to eliminate correlated features with a threshold exceeding 0.9. This process was performed iteratively, beginning with the pair exhibiting the highest correlation and without knowledge of the outcome data. The feature with the highest average correlation with the remaining features was discarded for each pair. We first removed correlated radiomic features separately within the following three radiomic feature categories: shape, statistics, and texture. Subsequently, we eliminated features with between-class correlations exceeding 0.9, prioritizing (1) shape features over statistics or texture features and (2) statistics features over texture features. Consequently, we retained features with simpler physical and statistical interpretations.8,32
2.8. Model Building
Random forest was used to classify benign and malignant lung nodules using radiomic features. Class weighting was applied to address imbalanced classes; a total of 100 trees were incorporated for satisfactory model accuracy; and a five-leaf node maximum was set to avoid overfitting. The performance metric area under the curve (AUC) was used to evaluate the classification models. A 100 times repeated stratified five-fold cross-validation was pooled into a single receiver operator characteristic (ROC) curve with mean AUCs and 95% CIs for each FM, RB, BB, and nnU-Net model.
2.9. Statistical Analysis
In datasets 1 and 2, the power of the ROC curve for detecting an AUC of 0.70 was 0.96 and 0.91, respectively. The Kruskal–Wallis test compared FM, RB, and BB segmentation times. A 100 times repeated stratified five-fold cross-validation was pooled into a ROC curve with mean AUCs and 95% CIs. The 95% CIs of AUCs were evaluated using the DeLong asymptotically exact test.33 The statistical significance for the difference in AUC between models was also assessed by DeLong’s test. All statistical tests were two-sided, and -values below 0.05 were considered statistically significant. Python v3.11 was used with Pyradiomics v3.0.1 and SimpleITK34–36 for image pre and post-processing, nnU-Net segmentations, and radiomic models. R v4.2.3 was used with pROC37 for ROC curves, AUCs, 95% CIs, and DeLong and Kruskal–Wallis tests.
3. Results
3.1. Data Collection
Subject and lung nodule characteristics are provided in Table 1 for datasets 1 and 2. In dataset 1, which consisted of 102 lung nodules, 48/102 (47.1%) were benign, whereas 17/102 (16.7%), 11/102 (10.8%), and 26/102 (25.5%) were ADC, squamous cell carcinoma (SCC), and SCLC, respectively. Likewise, in dataset 2 with 86 lung nodules, 51/86 (59.3%) were benign, whereas 21/86 (24.4%), 10/86 (11.6%), and 4/86 (4.7%) were ADC, SCC, and SCLC, respectively. In dataset 1, the malignant lung nodules had a larger volume than the benign lung nodules, with a .
Table 1.
Subject and lung nodule characteristics.
| Dataset | Characteristic | Benign | ADC | SCC | SCLC | Total |
|---|---|---|---|---|---|---|
| 1 | No. of subjects | 48 | 17 | 11 | 26 | 102 |
| Sex | ||||||
| Male | 47 | 16 | 11 | 26 | 102 | |
| Female | 1 | 1 | 0 | 0 | 2 | |
| Median age (IQR) | 70 (65.5 to 75) | 71 (68 to 80) | 74 (71.5 to 79.5) | 72.5 (70 to 80.8) | 72 (67 to 78) | |
| Median volume (IQR) | 801 (471.5 to 2023) | 19,397 (10,598 to 101,901) | 4126 (2446.8 to 19,157.3) | 31,161 (9308 to 145,633.5) | 3632.5 (834.8 to 22,013.8) | |
| 2 | No. of subjects | 51 | 21 | 10 | 4 | 86 |
| Sex | ||||||
| Male | 50 | 20 | 10 | 4 | 84 | |
| Female | 1 | 1 | 0 | 0 | 2 | |
| Median age (IQR) | 70 (65.5 to 76) | 72 (67.5 to 76.3) | 71 (68.0 to 79) | 69 (67.5 to 69.5) | 70.5 (66.0 to 76) | |
| Median volume (IQR) | 1321 (784 to 10,385.5) | 3676 (2831 to 25,600) | 3847 (831 to 16,626) | 17,452 (9896.8 to 26,113) | 3058 (865.8 to 14,682) |
Table 2 displays the CT scanning properties for datasets 1 and 2. General Electric Healthcare manufactured all CT scanners in both datasets. The CT scans in both datasets were acquired at a tube voltage of 120 kVP using a standard soft-tissue kernel, except for 1 CT scan in dataset 1, which had a tube voltage of 100 kVP, and another CT scan in dataset 2 that was acquired with a bone kernel.
Table 2.
CT scanning properties.
| Dataset | Manufacturer model | Benign | ADC | SCC | SCLC | Total |
|---|---|---|---|---|---|---|
| 1 | GE Medical Systems | |||||
| Discovery CT 750 HD | 48 | 17 | 11 | 23 | 99 | |
| LightSpeed VCT | 0 | 0 | 0 | 2 | 2 | |
| Optima CT660 | 0 | 0 | 0 | 1 | 1 | |
| Contrast | ||||||
| No | 36 | 9 | 8 | 9 | 62 | |
| Yes | 12 | 8 | 3 | 17 | 40 | |
| Kernel | ||||||
| Standard | 48 | 17 | 11 | 26 | 102 | |
| Tube voltage (kVp) | ||||||
| 120 | 47 | 17 | 11 | 26 | 102 | |
| 100 | 0 | 0 | 0 | 0 | 0 | |
| 2 | GE Medical Systems | |||||
| Discovery CT 750 HD | 38 | 16 | 6 | 1 | 62 | |
| LightSpeed VCT | 11 | 4 | 3 | 2 | 20 | |
| Discovery CT 690 HD | 1 | 0 | 1 | 1 | 3 | |
| Revolution CT | 0 | 1 | 0 | 0 | 1 | |
| Contrast | ||||||
| No | 40 | 14 | 7 | 3 | 64 | |
| Yes | 11 | 7 | 3 | 1 | 22 | |
| Kernel | ||||||
| Standard | 49 | 21 | 10 | 4 | 84 | |
| Bone | 2 | 0 | 0 | 0 | 2 | |
| Tube voltage (kVp) | ||||||
| 120 | 41 | 21 | 10 | 4 | 86 |
3.2. Segmentation Time
The median time and interquartile range (IQR) in seconds needed for two radiologists to perform the FM, RB, and BB segmentations of a benign, ADC, and SCLC lung nodule was 113 (54 to 251.5), 21 (9.25 to 38), and 16 (12 to 64.25) s, respectively, with a -value = 0.04. Figure 2(b) presents a bar plot illustrating the median time in seconds the two radiologists took to delineate FM, RB, and BB segmentations of the benign, ADC, and SCLC lung nodules, as well as examples of segmentations. Irrespective of the diagnosis, subtype, or volume, FM segmentations required longer segmentation times for benign, ADC, and SCLC than their respective RB and BB segmentations.
Fig. 2.
(a) Displayed in the axial plane are cropped CT images of three subjects—a 71-year-old male with a benign lung nodule, an 84-year-old male with an ADC lung nodule, and a 67-year-old male with an SCLC lung nodule arranged from top to bottom. Note that while cropped images were provided for ease of viewing, the original full resolution image was used for model development. Each CT image is overlaid with segmentations from four methods: FM, RB, BB, and nnU-Net, arranged from left to right. In addition, a bar plot (b) illustrates the average time, in seconds, required for two radiologists to delineate the FM, RB, and BB segmentations for these benign, ADC, and SCLC lung nodules.
3.3. Radiomic Features
In total, 107 IBSI-compliant radiomic features were extracted from each FM, RB, BB, and nnU-Net segmentation method. Figure 3 depicts the first two principal components (PCs) of the radiomic features extracted from the FM segmentations in datasets 1 and 2. Each sample is color-coded based on whether it was acquired from a CE or non-CE (NCE) CT scan. In datasets 1 and 2, the contrast batch is highlighted by the color separation observed in Figs. 3(a) and 3(b). In dataset 1, we observe a separation of samples from CE and NCE CT scans. The batch variation is mostly explained by PC1. Figures 3(c) and 3(d) demonstrate that ComBat effectively harmonized the feature variation due to the contrast batch effect in datasets 1 and 2, respectively.
Fig. 3.
PCA plots illustrate the multivariate variation among the 107 radiomic features obtained from the FM segmentation consisting of CE and NCE CT scans before ComBat in (a) dataset 1 and (b) dataset 2, and after Combat in (c) dataset 1 and (d) dataset 2. The percentage of explained variation is shown in brackets with the axis label.
From the 107 IBSI-compliant radiomic features obtained from the FM, RB, BB, and nnU-Net segmentations, all 107 features were retained from the FM and nnU-Net segmentations, and 94 radiomic features were kept from the RB and BB segmentations. Shape features were included in the radiomic models originating from FM and nnU-Net segmentations, whereas they were excluded in the models based on RB and BB segmentations due to their fixed shapes.
3.4. Inter-Rater Reliability
Using the subset () of dataset 1 with FM, RB, and BB segmentations delineated by the two radiologists, 76, 84, and 104 IBSI-compliant radiomic features had an ICC of at least 0.8. In addition, the erosions and dilations applied to these FM, RB, and BB segmentations produced 58, 66, and 106 IBSI-compliant radiomic features. Out of the radiomics features derived from erosions and dilations, 52/58 (89.7%), 60/66 (90.9%), and 103/106 (97.2%) were shared with the features originating from the FM, RB, and BB segmentations as delineated by both radiologists. Although erosions and dilations provided a reasonable approximation of the radiologists’ assessments, i.e., a reasonable method of simulating multiple segmenters, we chose to use the IBSI-compliant radiomic features with an ICC of at least 0.8 from the two radiologists given this method is traditionally used.
3.5. Segmentation Agreement
The median DSC (IQR) values were 0.65 (0.40 to 0.77) and 0.65 (0.43 to 0.78) in datasets 1 and 2, respectively. In datasets 1 and 2, which contained 82 and 86 lung nodules (after removal of the 20 nodules used for features selection in dataset 1), respectively, 20 and 12 nodules had nnU-Net segmentations with no agreement () to the FM segmentations. These nodules were subsequently removed from the comparison of FM, RB, BB, and nnU-Net models presented in Fig. 5. In dataset 1, 20/82 (24.4%) lung nodules—consisting of benign (), ADC (), and SCLC ()—exhibited a DSC of 0 between their nnU-Net and FM segmentations. The median volume (IQR) in of these benign, ADC, and SCLC lung nodules in dataset 1 was 1000 (479 to 6730), 142,756 (90,121 to 195,391), and 16,807 (9246 to 79,374), respectively. In dataset 2, 12/86 (14%) lung nodules comprising benign (), ADC (), SCC (), and SCLC () had a DSC of 0 between their nnU-Net and FM segmentations. The median volume (IQR) in of these benign, ADC, SCC, and SCLC lung nodules in dataset 2 was 959 (758 to 2012), 2889, 3876, and 21,872, respectively. Among the 20 lung nodules in dataset 1 with a DSC of 0 between their nnU-Net and FM segmentations, 13 were NCE CT scans, and 7 were CE CT scans. Likewise, in dataset 2, 9 out of the 12 lung nodules with a DSC of 0 between their nnU-Net and FM segmentations were NCE CT scans, and 3 were CE CT scans.
Fig. 5.
Pooled ROC curves with 95% CIs for the CT radiomic models derived from FM, RB, BB, and nnU-Net segmentation for classifying benign and malignant lung nodules at (a) dataset 1 () and (b) dataset 2 ().
3.6. Correlated Radiomic Features
The FM, RB, and BB segmentation-derived radiomic models incorporated 25, 26, and 38 radiomic features, respectively, from the 82 lung nodules in dataset 1. Similarly, these models included 27, 29, and 41 radiomic features, respectively, from the 86 lung nodules in dataset 2. The FM, RB, BB, and nn-UNet segmentation-derived radiomic models contained 22, 24, 38, and 48 radiomic features, respectively, from the 62 lung nodules in dataset 1. Likewise, these models included 26, 28, 38, and 44 radiomic features, respectively, from the 74 lung nodules in dataset 2. All of these radiomic feature sets had an and correlation .
3.7. Model Building
Figure 4 displays the combined ROC curves for CT radiomic models based on FM, RB, and BB segmentations for classifying benign and malignant lung nodules in (a) dataset 1 () and (b) dataset 2 (). In both datasets, the AUC differences between the FM and RB, as well as FM and BB models, were significant, with . Figure 5 illustrates the pooled ROC curves for CT radiomic models using FM, RB, BB, and nnU-Net segmentations in (a) dataset 1 () and (b) dataset 2 (). In dataset 1, the AUC differences between the FM and RB, and FM and nnU-Net models were significant with ; however, the AUC difference between the FM and BB model was not significant, with a -value of 0.16. In dataset 2, the AUC differences between the FM and RB, FM and BB, and FM and nnU-Net segmentations were all significant with .
Fig. 4.
Combined ROC curves with 95% CIs for the CT radiomic models based on FM, RB, and BB segmentations for classifying benign and malignant lung nodules at (a) dataset 1 () and (b) dataset 2 ().
4. Discussion
This study compared the performance of RB, BB, and nnU-Net segmentation methods with the standard FM segmentation method to predict benign and malignant lung nodules using CT radiomic models. The RB segmentation had the shortest segmentation time compared to the BB and FM segmentations and the best classification performance in both datasets. The BB segmentations had a faster segmentation time than the FM segmentations and similar and better classification performance than the FM and nnU-Net segmentations in both datasets. Although most nnU-Net segmentations exhibited good agreement with their respective FM segmentations in both datasets, a subset of lung nodules had no agreement between nnU-Net and FM segmentations. Moreover, the nnU-Net segmentations had a relatively poor classification performance compared to the FM, RB, and BB segmentations in both datasets 1 and 2.
In this study, consistent acquisition parameters, pre-processing, and radiomic feature extraction were employed in both datasets, ensuring that the radiomic features were solely dependent on the segmentation method. The enhanced performance of the radiomic model derived from RB segmentation might be due to the intra-tumoral RB segmentation located within and at the center of the lung nodule. The high AUC of the RB segmentation-derived model in both datasets also suggests that the model is capable of effectively detecting a set of pertinent radiomic features. Moreover, after feature reduction, the RB model contained fewer radiomic features, which helped avoid overfitting. Employing the erosion and dilation method previously described may efficiently mimic multiple readers and could be utilized for further feature reduction as an additional tool to help prevent overfitting.
The BB model’s performance may be due to its peri-tumoral BB segmentation just outside the lung nodule, which has been proven advantageous for detecting tumor spread and predicting survival in early-stage lung cancer.38,39 Although the BB model did not outperform the RB model, it had a similar classification performance to the FM segmentations in both datasets. With its heterogenous peri- and intra-tumoral BB segmentation, the BB model contained many reliable and non-redundant radiomic features. However, some of these features were likely irrelevant, allowing BB models to misinterpret noise as a signal. The BB model could be valuable for studies focusing on the peri-tumoral environment, but feature selection should be executed. It is worth noting that we assessed the optimal bounding box, and manually drawn bounding boxes may not yield the same level of performance.
Although nnU-Net segmentations exhibited satisfactory agreement with FM segmentations, the nnU-Net model demonstrated the weakest classification performance across both datasets. This may be because nnU-Net was trained only on benign, ADC, and SCC nodules in the LIDC-IDRI, NSCLC-Radiogenomics, and NSCLC-Radiomics-Genomics datasets, resulting in a poor agreement between nnU-Net and FM segmentations for SCLC lung nodules. To improve its performance, retraining the nnU-Net on a dataset with lung nodules similar to those in this study, including SCLC lung nodules, could be beneficial. Despite these limitations, 3D nnU-Net segmentations could emerge as a fast, scalable, and reproducible segmentation method.
Our study encountered several limitations. First, a single radiologist delineated all FM and RB segmentations. A second radiologist provided a subset of FM and RB segmentations in dataset 1, allowing us to identify reliable radiomic features for FM, RB, and BB segmentations. Nevertheless, we had to remove these 20 lung nodules from dataset 1, further reducing its small sample size. We aimed to simulate multiple readers by dilating and eroding FM, RB, and BB segmentations. Although the results appear promising, further research is required to validate these findings. Second, datasets 1 and 2 had small sample sizes of 102 and 86, respectively. Unfortunately, no publicly available large lung CT datasets with pathologically diagnosed benign and malignant lung nodules are available. While the LIDC-IDRI dataset () does contain benign and malignant lung nodules, only of these nodules have a pathological tissue diagnosis, and the majority of the malignant nodules are related to metastatic cancer. After excluding (1) the 20 lung nodules from dataset 1, which were used to identify reliable radiomic features, and (2) the lung nodules with no agreement (DSC = 0) between nnU-Net and FM segmentation, the sample sizes for datasets 1 and 2 were reduced to 62 and 74, respectively. Moreover, an accurate evaluation requires a train-test split along with an independent validation set; however, given the small sample sizes of datasets 1 and 2, the CT radiomics models were assessed using 100 iterations of stratified five-fold cross-validation. Third, our study primarily demonstrates the effectiveness of CT radiomic models derived from FM, RB, BB, and nnU-Net segmentations in predicting benign and malignant lung nodules across two datasets. Consequently, further research is needed to explore the performance of these models for CT scans of other organs and different classifications.
Radiomic models originating from BB and nnU-Net segmentations demonstrated similar and poor performance, respectively, compared to those derived from FM segmentations. Radiomic models developed from RB segmentations, however, outperformed those originating from FM segmentations. Radiologists can perform RB and BB segmentations quickly and easily, whereas nnU-Net segmentations require radiologists to examine and validate them. While RB and BB segmentations require multiple readers, nnU-Net must be trained on a diverse variety of lung nodule types to achieve optimal agreement with FM segmentations. RB segmentations hold promise for facilitating the large-scale, multi-center studies needed to integrate CT radiomic models into clinical practice. Moreover, these efficient and straightforward RB segmentations can serve as practical alternatives to laborious and time-consuming FM segmentations.
Acknowledgments
Supported by an institutional grant from Stanford University Artificial Intelligence in Medicine and Imaging Center and by the National Institutes of Health (Grant No. U24 CA180927). H.M.S. is supported by the Big Data Scientific Training Enhancement Program (BD-Step). This study was carried out with the assistance of resources and facilities at the VA Palo Alto Health Care System in Palo Alto, California, United States.
Biography
Biographies of the authors are not available.
Disclosures
R.P.S. reports grants from Stanford University, during the conduct of the study; personal fees from Genentech, personal fees from Intuit Surgical, personal fees from Artio Medical, grants from Merit Medical, contracted research from Canon, Inc., and contracted research from Lucence Health. S.N. reports personal fees from Fovia, Inc., Member of Scientific Advisory Board from Radlogics, Inc., Member of Scientific Advisory Board from EchoPixel, Inc., outside the submitted work. O.G. reports grants from Onc.AI, grants from Lucence Diagnostics, and grants from Nividien Inc., outside the submitted work. No industry support was provided for this study. The remaining authors have no conflicts of interest to disclose.
Contributor Information
Heather M. Selby, Email: selbyh@stanford.edu.
Pritam Mukherjee, Email: pritam.mukherjee@nih.gov.
Christopher Parham, Email: caparham@stanford.edu.
Sachin B. Malik, Email: sbmalik@stanford.edu.
Olivier Gevaert, Email: olivier.gevaert@stanford.edu.
Sandy Napel, Email: snapel@stanford.edu.
Rajesh P. Shah, Email: rajshah@stanford.edu.
Code and Data Availability
The authors can provide the code and data supporting the conclusions of this study upon request.
References
- 1.Aerts H. J. W. L., “The potential of radiomic-based phenotyping in precision medicine: a review,” JAMA Oncol. 2, 1636 (2016). 10.1001/jamaoncol.2016.2631 [DOI] [PubMed] [Google Scholar]
- 2.Aerts H. J. W. L., et al. , “Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach,” Nat. Commun. 5, 4006 (2014). 10.1038/ncomms5006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Linning E., et al. , “Radiomics for classification of lung cancer histological sub-types based on nonenhanced computed tomography,” Acad. Radiol. 26, 1245–1252 (2019). 10.1016/j.acra.2018.10.013 [DOI] [PubMed] [Google Scholar]
- 4.Gevaert O., et al. , “Predictive radiogenomics modeling of EGFR mutation status in lung cancer,” Sci. Rep. 7, 41674 (2017). 10.1038/srep41674 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Gevaert O., et al. , “Non-small cell lung cancer: identifying prognostic imaging biomarkers by leveraging public gene expression microarray data—methods and preliminary results,” Radiology 264, 387–396 (2012). 10.1148/radiol.12111607 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Gillies R. J., Kinahan P. E., Hricak H., “Radiomics: images are more than pictures, they are data,” Radiology 278, 563–577 (2016). 10.1148/radiol.2015151169 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Beig N., et al. , “Perinodular and intranodular radiomic features on lung CT images distinguish adenocarcinomas from granulomas,” Radiology 290, 783–792 (2019). 10.1148/radiol.2018180910 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Shur J. D., et al. , “Radiomics in oncology: a practical guide,” RadioGraphics 41, 1717–1732 (2021). 10.1148/rg.2021210037 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Parmar C., et al. , “Robust radiomics feature quantification using semiautomatic volumetric segmentation,” PLoS ONE 9, e102107 (2014). 10.1371/journal.pone.0102107 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Owens C. A., et al. , “Lung tumor segmentation methods: impact on the uncertainty of radiomics features for non-small cell lung cancer,” PLoS ONE 13, e0205003 (2018). 10.1371/journal.pone.0205003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Haarburger C., et al. , “Author correction: radiomics feature reproducibility under inter-rater variability in segmentations of CT images,” Sci. Rep. 11, 22670 (2021). 10.1038/s41598-021-02114-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Echegaray S., et al. , “Core samples for radiomics features that are insensitive to tumor segmentation: method and pilot study using CT images of hepatocellular carcinoma,” J. Med. Imaging 2, 041011 (2015). 10.1117/1.JMI.2.4.041011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Jaggi A., et al. , “Quantitative image features from radiomic biopsy differentiate oncocytoma from chromophobe renal cell carcinoma,” J. Med. Imaging 8, 054501 (2021). 10.1117/1.JMI.8.5.054501 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Echegaray S., et al. , “A rapid segmentation-insensitive ‘digital biopsy’ method for radiomic feature extraction: method and pilot study using CT images of non–small cell lung cancer,” Tomography 2, 283–294 (2016). 10.18383/j.tom.2016.00163 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Song J., et al. , “A review of original articles published in the emerging field of radiomics,” Eur. J. Radiol. 127, 108991 (2020). 10.1016/j.ejrad.2020.108991 [DOI] [PubMed] [Google Scholar]
- 16.Shah R. P., et al. , “Machine learning radiomics model for early identification of small-cell lung cancer on computed tomography scans,” JCO Clin. Cancer Inf. 5, 746–757 (2021). 10.1200/CCI.21.00021 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Isensee F., et al. , “nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation,” Nat. Methods 18, 203–211 (2021). 10.1038/s41592-020-01008-z [DOI] [PubMed] [Google Scholar]
- 18.Clark K., et al. , “The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository,” J. Digit. Imaging 26, 1045–1057 (2013). 10.1007/s10278-013-9622-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Armato S. G., et al. , “The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): a completed reference database of lung nodules on CT scans: the LIDC/IDRI thoracic CT database of lung nodules,” Med. Phys. 38, 915–931 (2011). 10.1118/1.3528204 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Bakr S., et al. , “A radiogenomic dataset of non-small cell lung cancer,” Sci. Data 5, 180202 (2018). 10.1038/sdata.2018.202 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Bakr S., et al. , “Data for NSCLC radiogenomics collection,” Version number: 4 Type: dataset (2017).
- 22.Aerts H. J. W. L., et al. , “Data from NSCLC- radiomics-genomics,” Version number: 1 Type: dataset (2015).
- 23.Yushkevich P. A., et al. , “User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability,” NeuroImage 31, 1116–1128 (2006). 10.1016/j.neuroimage.2006.01.015 [DOI] [PubMed] [Google Scholar]
- 24.Zwanenburg A., et al. , “Image biomarker standardisation initiative,” Radiology 295, 328–338 (2020). 10.1148/radiol.2020191145 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.van Griethuysen J. J., et al. , “Computational radiomics system to decode the radiographic phenotype,” Cancer Res. 77, e104–e107 (2017). 10.1158/0008-5472.CAN-17-0339 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Johnson W. E., Li C., Rabinovic A., “Adjusting batch effects in microarray expression data using empirical Bayes methods,” Biostatistics 8, 118–127 (2007). 10.1093/biostatistics/kxj037 [DOI] [PubMed] [Google Scholar]
- 27.Qiu Q., et al. , “Reproducibility and non-redundancy of radiomic features extracted from arterial phase CT scans in hepatocellular carcinoma patients: impact of tumor segmentation variability,” Quant. Imaging Med. Surg. 9, 453–464 (2019). 10.21037/qims.2019.03.02 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Koo T. K., Li M. Y., “A guideline of selecting and reporting intraclass correlation coefficients for reliability research,” J. Chiropract. Med. 15, 155–163 (2016). 10.1016/j.jcm.2016.02.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Kamnitsas K., et al. , “Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation,” Med. Image Anal. 36, 61–78 (2017). 10.1016/j.media.2016.10.004 [DOI] [PubMed] [Google Scholar]
- 30.Bertels J., et al. , “Optimizing the Dice score and Jaccard index for medical image segmentation: theory & practice,” arXiv version number: 1 (2019). [DOI] [PubMed]
- 31.Berenguer R., et al. , “Radiomics of CT features may be nonreproducible and redundant: influence of CT acquisition parameters,” Radiology 288, 407–415 (2018). 10.1148/radiol.2018172361 [DOI] [PubMed] [Google Scholar]
- 32.van Timmeren J. E., et al. , “Radiomics in medical imaging ‘how-to’ guide and critical reflection,” Insights Imaging 11, 91 (2020). 10.1186/s13244-020-00887-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.DeLong E. R., DeLong D. M., Clarke-Pearson D. L., “Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach,” Biometrics 44, 837 (1988). 10.2307/2531595 [DOI] [PubMed] [Google Scholar]
- 34.Beare R., Lowekamp B., Yaniv Z., “Image segmentation, registration and characterization in R with SimpleITK,” J. Stat. Softw. 86, 8 (2018). 10.18637/jss.v086.i08 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Yaniv Z., et al. , “SimpleITK image-analysis notebooks: a collaborative environment for education and reproducible research,” J. Digit. Imaging 31, 290–303 (2018). 10.1007/s10278-017-0037-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Lowekamp B. C., et al. , “The design of SimpleITK,” Front. Neuroinf. 7, 45 (2013). 10.3389/fninf.2013.00045 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Robin X., et al. , “pROC: an open-source package for R and S+ to analyze and compare ROC curves,” BMC Bioinf. 12, 77 (2011). 10.1186/1471-2105-12-77 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Liao G., et al. , “Preoperative CT-based peritumoral and tumoral radiomic features prediction for tumor spread through air spaces in clinical stage I lung adenocarcinoma,” Lung Cancer 163, 87–95 (2022). 10.1016/j.lungcan.2021.11.017 [DOI] [PubMed] [Google Scholar]
- 39.Wang T., et al. , “Radiomics for survival risk stratification of clinical and pathologic stage IA pure-solid non–small cell lung cancer,” Radiology 302, 425–434 (2022). 10.1148/radiol.2021210109 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The authors can provide the code and data supporting the conclusions of this study upon request.





