Skip to main content
Radiology: Artificial Intelligence logoLink to Radiology: Artificial Intelligence
. 2024 Jan 31;6(2):e230118. doi: 10.1148/ryai.230118

Identification of Precise 3D CT Radiomics for Habitat Computation by Machine Learning in Cancer

Olivia Prior 1, Carlos Macarro 1, Víctor Navarro 1, Camilo Monreal 1, Marta Ligero 1, Alonso Garcia-Ruiz 1, Garazi Serna 1, Sara Simonetti 1, Irene Braña 1, Maria Vieito 1, Manuel Escobar 1, Jaume Capdevila 1, Annette T Byrne 1, Rodrigo Dienstmann 1, Rodrigo Toledo 1, Paolo Nuciforo 1, Elena Garralda 1, Francesco Grussu 1, Kinga Bernatowicz 1,#, Raquel Perez-Lopez 1,✉,#
PMCID: PMC10982821  PMID: 38294307

Abstract

Purpose

To identify precise three-dimensional radiomics features in CT images that enable computation of stable and biologically meaningful habitats with machine learning for cancer heterogeneity assessment.

Materials and Methods

This retrospective study included 2436 liver or lung lesions from 605 CT scans (November 2010–December 2021) in 331 patients with cancer (mean age, 64.5 years ± 10.1 [SD]; 185 male patients). Three-dimensional radiomics were computed from original and perturbed (simulated retest) images with different combinations of feature computation kernel radius and bin size. The lower 95% confidence limit (LCL) of the intraclass correlation coefficient (ICC) was used to measure repeatability and reproducibility. Precise features were identified by combining repeatability and reproducibility results (LCL of ICC ≥ 0.50). Habitats were obtained with Gaussian mixture models in original and perturbed data using precise radiomics features and compared with habitats obtained using all features. The Dice similarity coefficient (DSC) was used to assess habitat stability. Biologic correlates of CT habitats were explored in a case study, with a cohort of 13 patients with CT, multiparametric MRI, and tumor biopsies.

Results

Three-dimensional radiomics showed poor repeatability (LCL of ICC: median [IQR], 0.442 [0.312–0.516]) and poor reproducibility against kernel radius (LCL of ICC: median [IQR], 0.440 [0.33–0.526]) but excellent reproducibility against bin size (LCL of ICC: median [IQR], 0.929 [0.853–0.988]). Twenty-six radiomics features were precise, differing in lung and liver lesions. Habitats obtained with precise features (DSC: median [IQR], 0.601 [0.494–0.712] and 0.651 [0.52–0.784] for lung and liver lesions, respectively) were more stable than those obtained with all features (DSC: median [IQR], 0.532 [0.424–0.637] and 0.587 [0.465–0.703] for lung and liver lesions, respectively; P < .001). In the case study, CT habitats correlated quantitatively and qualitatively with heterogeneity observed in multiparametric MRI habitats and histology.

Conclusion

Precise three-dimensional radiomics features were identified on CT images that enabled tumor heterogeneity assessment through stable tumor habitat computation.

Keywords: CT, Diffusion-weighted Imaging, Dynamic Contrast-enhanced MRI, MRI, Radiomics, Unsupervised Learning, Oncology, Liver, Lung

Supplemental material is available for this article.

© RSNA, 2024

See also the commentary by Sagreiya in this issue.

Keywords: CT, Diffusion-weighted Imaging, Dynamic Contrast-enhanced MRI, MRI, Radiomics, Unsupervised Learning, Oncology, Liver, Lung


graphic file with name ryai.230118.VA.jpg


Summary

Tumor heterogeneity was evaluated by computing stable CT tumor habitats with unsupervised learning on repeatable and reproducible three-dimensional radiomics features in lung and liver cancer lesions.

Key Points

  • ■ In this retrospective study of 2436 tumoral lesions from 605 CT scans, three-dimensional radiomics features showed poor repeatability (median lower 95% confidence limit [LCL] of the intraclass correlation coefficient [ICC], 0.442) and reproducibility (LCL of ICC, 0.440) against kernel radius but excellent reproducibility (LCL of ICC, 0.929) against bin size.

  • ■ Of 91 three-dimensional radiomics features analyzed, 26 were identified as precise (ie, featuring moderate, good, or excellent repeatability and reproducibility), with different sets of precise features identified in lung and liver lesions; habitats obtained with the subsets of precise radiomics features were more stable than those obtained with all computed features in both lung and liver lesions (13% and 11% increase in median Dice similarity coefficient in lung and liver lesions, respectively; P < .001).

  • ■ In an exploratory case study, CT habitats correlated well with multiparametric MRI habitats and histology, capturing intratumoral heterogeneity (eg, areas with different tumor cell density and vascularization or characterized by necrosis or fibrosis).

Introduction

Intratumoral spatial heterogeneity is a well-known characteristic of cancer (1). At the smallest scales, such heterogeneity refers to regions with diverse clones and tumor microenvironment harboring various levels of genomic and transcriptomic expressions. At macroscopic scales, this translates to tumor niches (ie, subpopulation of cells localized in a particular intratumoral region) and tissue types (ie, fibrosis, necrosis) (2). Such heterogeneity poses a challenge to targeted therapies, as distinct intratumoral regions may develop resistance to treatment (3,4). As a result, new lines of research have emerged to detect and quantify such heterogeneity noninvasively. A notable example, holding promise in the clinical setting, is CT-based habitat imaging, which aims to identify spatial regions (habitats) that exhibit shared imaging phenotypes in CT scans (5,6). The main advantage of CT tumor habitats is their ability to capture heterogeneity of the whole tumor (ie, three-dimensionally) noninvasively. As such, in the age of precision medicine, CT tumor habitats could serve as a valuable tool to detect treatment-resistant habitats (and thus improve treatment selection), as well as to monitor response and tumor evaluation longitudinally and repeatedly throughout disease progression (7).

Successful implementation of tumor habitats in the clinic requires robust voxelwise or three-dimensional (3D) radiomics features (RFs) (8), which are the underlying imaging texture features used to generate them. In other words, RFs should be both repeatable (ie, exhibiting measurement precision under the same set of computation conditions, also known as test-retest) and reproducible (ie, exhibiting measurement precision under different computation conditions) (9,10). However, current literature on the precision of CT radiomics is focused on RFs as independent predictive or prognostic biomarkers and is limited regarding cancer types and reporting quality, with many studies neglecting to provide critical information such as whether texture RFs were computed in two dimensions or 3D (11,12). This latter point is especially relevant for habitat computation since two-dimensional RFs, which are computed disregarding neighboring voxels on out-of-plane sections, are less representative of tumor heterogeneity (13,14). Thus, there is a lack of knowledge on precision of 3D RFs for CT tumor habitat computation.

In this study, we aimed to fill this knowledge gap and assess how 3D RFs are affected by three different sources of variability: (a) test-retest scenarios; (b) changes in kernel radius, which specifies the number of neighboring voxels to be considered when computing features; and (c) changes in bin size, which defines the number of gray levels before feature computation. Features showing acceptable repeatability and reproducibility against the two computation parameters were identified as precise. We developed a Gaussian mixture model–based unsupervised machine learning model (15,16) for tumor habitat computation and studied whether the use of precise RFs resulted in more stable habitats. Last, we explored the biologic correlates of CT habitats in an independent cohort with multiparametric MRI (mpMRI) and digitized images of hematoxylin-eosin–stained biopsies. Our main goal was to develop a method to compute stable lung and liver tumor habitats based on the identified precise 3D radiomic features and Gaussian mixture models to assess intratumoral heterogeneity.

Materials and Methods

Patient Cohorts

We retrospectively analyzed 2436 lesions (1861 liver and 575 lung) from 331 patients (mean age, 64.5 years ± 10.1 [SD]; 185 male patients) with CT scans at multiple time points (605 total CT scans [Table 1]). Intravenous contrast-enhanced CT scans from patients with advanced cancer and lung or liver tumors from November 2010 to December 2021 were included. The analysis of these anonymized CT images was approved by the Vall d’Hebron Ethics Committee with waiver of informed consent. The study sample was split into four different cohorts (Fig 1A), depending on primary tumor location: (a) colorectal, (b) lung, (c) gastrointestinal neuroendocrine tumors, and (d) a cohort including a mix of other cancers. Patients with gastrointestinal neuroendocrine tumors were selected from the multicenter phase 2 TALENT (Lenvatinib Efficacy in Metastatic Neuroendocrine Tumors) trial (NCT02678780). Details of patient cohorts and imaging protocols are reported in Tables S1S3. Patients from the independent cohort with CT, mpMRI, and biopsies, who were included in our case study, were included in the PREDICT prospective trial (also approved by the Vall d’Hebron Ethics Committee, PR [AG]29/2020) and signed consent for the acquisition and analysis of the CT and MRI scans and tumor biopsies. Details regarding imaging protocols and clinical information are reported in Appendix S1 and Tables S4 and S5.

Table 1:

Characteristics of Study Cohorts

graphic file with name ryai.230118.tbl1.jpg

Figure 1:

Graphs of distribution of lung and liver lesions across different cohorts for precision analysis. The neuroendocrine cohort included only liver lesions. (B) Graphic of precision analysis design. Three-dimensional (3D) radiomics features were computed from both original and perturbed images, four times per image, each time with a different combination of kernel radius (R) (1 mm or 3 mm) and bin size (B) (12 HU or 25 HU). To study repeatability, original-perturbed feature pairs were evaluated for every combination of computation settings (R1B12, R1B25, R3B12, and R3B25). To study reproducibility against computation parameters, we compared pairs of original features computed under varying settings. To understand reproducibility against the kernel radius, we kept the bin size constant at two separate levels (12 HU and 25 HU) and then altered the kernel radius. To explore reproducibility against the bin size, we kept the kernel radius constant at two different measures (1 mm and 3 mm) and varied the bin size. Precise features were selected by linking reproducibility and repeatability results. ROI = region of interest.

(A) Graphs of distribution of lung and liver lesions across different cohorts for precision analysis. The neuroendocrine cohort included only liver lesions. (B) Graphic of precision analysis design. Three-dimensional (3D) radiomics features were computed from both original and perturbed images, four times per image, each time with a different combination of kernel radius (R) (1 mm or 3 mm) and bin size (B) (12 HU or 25 HU). To study repeatability, original-perturbed feature pairs were evaluated for every combination of computation settings (R1B12, R1B25, R3B12, and R3B25). To study reproducibility against computation parameters, we compared pairs of original features computed under varying settings. To understand reproducibility against the kernel radius, we kept the bin size constant at two separate levels (12 HU and 25 HU) and then altered the kernel radius. To explore reproducibility against the bin size, we kept the kernel radius constant at two different measures (1 mm and 3 mm) and varied the bin size. Precise features were selected by linking reproducibility and repeatability results. ROI = region of interest.

Image Segmentation, Perturbation, and Feature Computation

An experienced radiologist (R.P.L.) with more than 10 years of experience in oncologic imaging segmented the entire volume of all measurable lesions according to the Response Evaluation Criteria in Solid Tumors, version 1.1 (17) (ie, maximal diameter ≥ 10 mm) using 3D Slicer, version 4.11.20210226 (https://www.slicer.org/) (18). We assessed repeatability by simulating a retest scenario with image perturbation using the Medical Image Radiomics Processor Python toolkit, version 1.2.0 (https://github.com/oncoray/mirp) (19). Details are provided in Appendix S2. Original (ie, no-filter) RFs were computed for every lesion using PyRadiomics, version 3.0.1 (https://github.com/AIM-Harvard/pyradiomics/) (20). All RFs were computed four times per lesion, each time using a different combination of settings for kernel radius (1 mm or 3 mm) and bin size (12 HU or 25 HU), hereinafter referred to as R1B12, R1B25, R3B12, and R3B25. We selected kernel radius and bin size values based on common practices, including those used by PyRadiomics, which defaults to kernel radius of 1 mm and bin size of 25 HU—these served as our primary reference points. Additionally, we selected bin size of 12 HU and kernel radius of 3 mm, as these values are also commonly employed in the field. In total, 91 features were analyzed (the full list is available in Table S4). Examples of RFs are displayed in Figure S1, and relevant computation details are reported in Table S5 and Appendix S3, in compliance with the Image Biomarker Standardization Initiative (21).

Repeatability and Reproducibility Analyses to Identify Precise Features

Figure 1B displays the precision analysis overview. We studied repeatability in four experiments, comparing radiomics values from original and perturbed (simulated retest) CT scans for each of the four setting combinations used (R1B12, R1B25, R3B12, and R3B25). For reproducibility against kernel radius, we conducted two experiments: first, we compared original RFs computed with different kernel radius and fixed bin size of 12 HU (ie, R1B12 vs R3B12); second, we compared original radiomics values computed with different kernel radius and fixed bin size of 25 HU (ie, R1B25 vs R3B25). Similarly, we studied reproducibility against bin size twice: first for original RFs computed with a fixed kernel radius of 1 mm (R1B12 vs R1B25) and then for original radiomics values computed with a fixed kernel radius of 3 mm (R3B12 vs R3B25). We conducted all experiments for all lesions and cohorts combined, and then for lung and liver lesions separately, as well as for each cohort (different primary tumor types) separately to understand the effect of primary tumor and lesion location on precision. The intraclass correlation coefficient (ICC) (22) was used to measure repeatability and reproducibility of features. A feature was selected as precise if the lower 95% confidence limit (LCL) of the ICC was 0.50 or more across the three relevant experiments: repeatability (R3B12), reproducibility against kernel radius (bin size = 12 HU), and reproducibility against bin size (kernel radius = 3 mm).

Imaging Habitats Computation

Habitats were computed for all lesions, four times per lesion, using either precise RFs or nonprecise RFs (ie, all computed RFs), in both original and perturbed images. Before computation, Spearman rank correlation coefficient (r) was used to eliminate highly correlated features (r ≥ 0.7) (23) at a significance value of P < .001. Nonredundant radiomics were clustered with Gaussian mixture models to obtain habitats. The optimal number of habitats was found using the Bayesian information criterion (24). The stability of habitats (ie, similarity of habitats computed in each original-perturbed pair) was measured according to the Dice similarity coefficient (DSC). See Appendix S4 for more details.

As an exploratory case study, we evaluated the biologic relevance of CT imaging habitats in an independent cohort of 13 patients with CT, mpMRI (including anatomic, diffusion-weighted, and perfusion MRI), and digitized hematoxylin-eosin images from biopsy. Methods and results related to this case study are available in Appendix S5.

Statistical Analysis

Following Koo and Li guidelines (22), we used the ICC based on a single-measurement, absolute-agreement, two-way mixed-effects model for the repeatability analysis. For reproducibility, we computed the ICC based on a single-measurement, consistency, two-way mixed-effects model. We assigned each feature’s repeatability and reproducibility to the following categories based on the LCL of the ICC: poor (LCL of ICC < 0.5); moderate (0.5 ≤ LCL of ICC < 0.75); good (0.75 ≤ LCL of ICC < 0.90); and excellent (LCL of ICC ≥ 0.90) following Koo and Li (22) and Zwanenburg et al (21). More details are provided in Appendix S6. A paired two-sided Wilcoxon signed rank test was conducted to evaluate the significance of differences in feature reproducibility between computation parameters, lesion location, and habitat stability with precise features or all features. The effect size of the tests was calculated with Cohen d and defined as small (d ≥ 0.20), medium (d ≥ 0.50), or large (d ≥ 0.80) (25). P < .05 was used as the threshold for statistical significance. All statistical tests were reviewed by a statistician (V.N.) and performed using Python, version 3.7.10 (Python Software Foundation). All codes can be publicly accessed at https://github.com/radiomicsgroup/precise-habitats.

Results

Repeatability Analysis

For every combination of computation settings (R1B12, R1B25, R3B12, and R3B25), the ICC was computed between the radiomics values computed from original (test) and perturbed (simulated retest) paired CT scans. Results showed that radiomics computed with R3B12 had the highest repeatability, with median (IQR) of the LCL of the ICC for all features of 0.442 (0.312–0.516), compared with 0.191 (0.116–0.382), 0.199 (0.103–0.344), and 0.415 (0.306–0.516) for settings R1B12, R1B25, and R3B25, respectively. Figure 2A shows the proportions of RFs with poor, moderate, good, and excellent repeatability.

Figure 2:

(A) Graph of repeatability distribution of radiomics features per setting. Most radiomics features exhibited poor repeatability. Features computed with a kernel radius (R) of 3 mm were more repeatable than those computed with a kernel radius of 1 mm. Bin size (B) changes did not affect repeatability. (B) Graph of repeatability distribution of radiomics features computed with setting R3B12 (kernel radius of 3 mm and bin size of 12 HU) per feature class for lung and liver lesions separately. First-order (FO) and gray-level run-length matrix (GRLM) features were more repeatable in liver lesions, while gray-level co-occurrence matrix (GLCM) features were more repeatable in lung lesions. GLDM = gray-level dependence matrix, GSZM = gray-level size zone matrix, LCL = lower 95% confidence limit of the intraclass correlation coefficient, NGTM = neighborhood gray-tone-difference matrix.

(A) Graph of repeatability distribution of radiomics features per setting. Most radiomics features exhibited poor repeatability. Features computed with a kernel radius (R) of 3 mm were more repeatable than those computed with a kernel radius of 1 mm. Bin size (B) changes did not affect repeatability. (B) Graph of repeatability distribution of radiomics features computed with setting R3B12 (kernel radius of 3 mm and bin size of 12 HU) per feature class for lung and liver lesions separately. First-order (FO) and gray-level run-length matrix (GRLM) features were more repeatable in liver lesions, while gray-level co-occurrence matrix (GLCM) features were more repeatable in lung lesions. GLDM = gray-level dependence matrix, GSZM = gray-level size zone matrix, LCL = lower 95% confidence limit of the intraclass correlation coefficient, NGTM = neighborhood gray-tone-difference matrix.

Regarding the effect of primary tumor and lesion location on repeatability, no evidence of differences in radiomics repeatability was found between different primary cancers for any of the settings (Fig S3). Moreover, although there were no major differences between liver and lung lesions in terms of the proportions of repeatable radiomics (Table S7), the type of repeatable radiomics differed (Fig 2B). For instance, first-order and gray-level run-length matrix features were more repeatable in liver lesions than in the lung, while gray-level co-occurrence matrix features were more repeatable in lung lesions. Figure S4 displays repeatability results for all features in the four repeatability experiments.

Reproducibility Analysis

Overall, radiomics values were more affected by changes in kernel radius than by changes in bin size (ie, RFs were more reproducible against bin size than against kernel radius). RFs computed with a fixed bin size of 12 HU were more reproducible than those computed with a fixed bin size of 25 HU (Fig 3A), with median (IQR) of the LCL of the ICC for all features of 0.440 (0.330–0.526) and 0.437 (0.355–0.524), respectively (P < .001). The Wilcoxon test also detected significant differences in reproducibility against bin size: features computed with a fixed kernel radius of 3 mm were more reproducible than those computed with a fixed kernel radius of 1 mm (Fig 3B), with median LCL of the ICC (IQR) of 0.929 (0.853–0.988) and 0.833 (0.706–0.946), respectively (P < .001).

Figure 3:

(A) Graph of reproducibility distribution against kernel radius (R) for features computed with a fixed bin size (B) of 12 HU and 25 HU. Most features showed poor reproducibility against kernel radius. Features computed with a bin size of 12 HU were more reproducible (P < .001). (B) Graph of reproducibility distribution against bin size for features computed with a fixed kernel radius of 3 mm and fixed kernel radius of 1 mm. Most features showed good or excellent reproducibility against bin size. Features computed with a kernel radius of 3 mm were more reproducible (P < .001). (C) Graph of reproducibility distribution against kernel radius for features computed with a fixed bin size of 12 HU per feature class for lung and liver lesions separately. Features computed from lung lesions were more reproducible against kernel radius, especially for features belonging to gray-level co-occurrence matrix (GLCM) and gray-level run-length matrix (GRLM) classes. (D) Graph of reproducibility distribution against bin size for features computed with a fixed kernel radius of 3 mm per feature class for lung and liver lesions separately. Features computed from lung lesions were more reproducible against bin size, especially for features belonging to GLCM, first-order (FO), and neighborhood gray-tone-difference matrix (NGTM) features classes. GLDM = gray-level dependence matrix, GSZM = gray-level size zone matrix, LCL = lower 95% confidence limit of the intraclass correlation coefficient.

(A) Graph of reproducibility distribution against kernel radius (R) for features computed with a fixed bin size (B) of 12 HU and 25 HU. Most features showed poor reproducibility against kernel radius. Features computed with a bin size of 12 HU were more reproducible (P < .001). (B) Graph of reproducibility distribution against bin size for features computed with a fixed kernel radius of 3 mm and fixed kernel radius of 1 mm. Most features showed good or excellent reproducibility against bin size. Features computed with a kernel radius of 3 mm were more reproducible (P < .001). (C) Graph of reproducibility distribution against kernel radius for features computed with a fixed bin size of 12 HU per feature class for lung and liver lesions separately. Features computed from lung lesions were more reproducible against kernel radius, especially for features belonging to gray-level co-occurrence matrix (GLCM) and gray-level run-length matrix (GRLM) classes. (D) Graph of reproducibility distribution against bin size for features computed with a fixed kernel radius of 3 mm per feature class for lung and liver lesions separately. Features computed from lung lesions were more reproducible against bin size, especially for features belonging to GLCM, first-order (FO), and neighborhood gray-tone-difference matrix (NGTM) features classes. GLDM = gray-level dependence matrix, GSZM = gray-level size zone matrix, LCL = lower 95% confidence limit of the intraclass correlation coefficient.

Analogous to repeatability, reproducibility was unaffected by primary tumor (Fig S5), while differences were observed between liver and lung lesions (Table 2). This result was true both in terms of number (Table S8) and type (Fig 3C, 3D) of reproducible RFs. RFs computed from lung lesions were more reproducible against both computation parameters than from liver lesions (P < .001), especially for features belonging to gray-level co-occurrence matrix and gray-level run-length matrix classes. Figure S6 displays repeatability results for all features in the four reproducibility experiments. All statistical details related to reproducibility are reported in Table S9.

Table 2:

Median (IQR) Lower 95% Confidence Limit of the Intraclass Correlation Coefficient for All Radiomics Features in Reproducibility Experiments

graphic file with name ryai.230118.tbl2.jpg

Identification of Precise Features for Liver and Lung Lesions

The LCL of the ICC of 0.50 or greater threshold was chosen to remove nonprecise features without overeliminating potentially informative features with moderate precision. The identification was carried out for lung and liver lesions separately based on precision results. The identification yielded a total of 25 precise RFs for liver and lung lesions, separately (Tables S10, S11). We added the neighborhood gray-tone-difference matrix coarseness feature for both lesions for being among the top three most repeatable and reproducible against bin size (extended explanation in Appendix S7), resulting in 26 precise RFs (Table 3). Figure 4 displays the results obtained in the three experiments for all features.

Table 3:

Precise Three-dimensional Radiomics Features in Liver and Lung Lesions

graphic file with name ryai.230118.tbl3.jpg

Figure 4:

Heat map displaying the lower 95% confidence limit of the intraclass correlation coefficient (LCL) results obtained in the three experiments used to identify precise features: repeatability (kernel radius [R] = 3 mm and fixed bin size [B] = 12 HU), reproducibility against kernel radius (fixed bin size = 12 HU), and reproducibility against bin size (fixed kernel radius = 3 mm) for lung and liver lesions separately. FO = first order, GLCM = gray-level co-occurrence matrix, GLDM = gray-level dependence matrix, GLRLM = gray-level run-length matrix, GLSZM = gray-level size zone matrix, NGTDM = neighborhood gray-tone-difference matrix.

Heat map displaying the lower 95% confidence limit of the intraclass correlation coefficient (LCL) results obtained in the three experiments used to identify precise features: repeatability (kernel radius [R] = 3 mm and fixed bin size [B] = 12 HU), reproducibility against kernel radius (fixed bin size = 12 HU), and reproducibility against bin size (fixed kernel radius = 3 mm) for lung and liver lesions separately. FO = first order, GLCM = gray-level co-occurrence matrix, GLDM = gray-level dependence matrix, GLRLM = gray-level run-length matrix, GLSZM = gray-level size zone matrix, NGTDM = neighborhood gray-tone-difference matrix.

Imaging Habitats Computed with Precise Features Show Increased Stability

Figure 5B shows an example of the resulting habitats for one liver lesion. For that lesion, the heat maps displaying significant correlations of precise RFs and nonprecise RFs are available in Figures S7 and S8, respectively. The final lists of nonredundant RFs and nonredundant-precise RFs are reported in Table S12.

Figure 5:

(A) Original and perturbed CT scans for one liver lesion (84-year-old man). (B) Graphs of example habitats obtained for the same lesion. Habitats computed with precise features show higher stability (measured via Dice similarity coefficient [DSC] of original-perturbed habitat pairs). Top row: Habitats obtained with precise features computed from the original image (left) and perturbed image (right). DSC scores for habitats 1, 2, and 3 are 0.976, 0.891, and 0.915, respectively. Bottom row: Habitats obtained with nonprecise (ie, all computed) features computed from the original image (left) and perturbed image (right). DSC scores for habitats 1, 2, and 3 are 0.751, 0.328, and 0.57, respectively. (C) Graph of quantification of habitat stability computed with precise features and nonprecise features for all lung and liver lesions. Boxes represent the IQR (25th–75th percentile), and the horizontal line inside the boxes represents the median value of the DSC. Whiskers represent the minimum and maximum values. Habitats computed with precise features show higher stability (Wilcoxon signed rank test, P < .0001). ♦ = outliers, **** = P < .0001.

(A) Original and perturbed CT scans for one liver lesion (84-year-old man). (B) Graphs of example habitats obtained for the same lesion. Habitats computed with precise features show higher stability (measured via Dice similarity coefficient [DSC] of original-perturbed habitat pairs). Top row: Habitats obtained with precise features computed from the original image (left) and perturbed image (right). DSC scores for habitats 1, 2, and 3 are 0.976, 0.891, and 0.915, respectively. Bottom row: Habitats obtained with nonprecise (ie, all computed) features computed from the original image (left) and perturbed image (right). DSC scores for habitats 1, 2, and 3 are 0.751, 0.328, and 0.57, respectively. (C) Graph of quantification of habitat stability computed with precise features and nonprecise features for all lung and liver lesions. Boxes represent the IQR (25th–75th percentile), and the horizontal line inside the boxes represents the median value of the DSC. Whiskers represent the minimum and maximum values. Habitats computed with precise features show higher stability (Wilcoxon signed rank test, P < .0001). ♦ = outliers, **** = P < .0001.

The Wilcoxon signed rank test revealed a statistically significant (P < .001) increase in habitat stability when habitats were computed with precise radiomics only (Fig 5B). This result was true for habitats computed in both liver and lung lesions, observing a small effect size on both, with Cohen d of 0.34 and 0.43, respectively. The median (IQR) DSC for habitat stability of habitats computed with nonprecise RFs was 0.532 (0.424–0.637) for lung and 0.587 (0.465–0.703) for liver lesions. For habitats computed with precise radiomics, the median (IQR) scores were 0.601 (0.494–0.712) and 0.651 (0.52–0.784), respectively.

Discussion

Computing robust and biologically meaningful tumor habitats (ie, phenotypically similar regions within tumors) from clinical CT imaging could greatly advance noninvasive, 3D evaluation of tumor heterogeneity, one of the hallmarks of cancer resistance. However, to achieve an effective clinical translation of CT habitats, excellent robustness of the underlying imaging features is essential. In this study, we examined 3D radiomics’ repeatability in a simulated test-retest scenario and reproducibility against kernel radius and bin size, two significant computation parameters. Our findings demonstrated that 3D radiomics exhibit poor repeatability and reproducibility against kernel radius but excellent reproducibility against bin size (median LCL of ICC of 0.442, 0.440, and 0.929, respectively). We identified different precise RFs for lung and liver lesions, with primary tumor having no impact on precision. CT tumor habitats computed with precise feature subsets were more stable than those using all computed features (13% and 11% increase in median DSC in lung and liver lesions, respectively; P < .001). In an independent cohort, CT tumor habitats correlated both quantitatively and qualitatively with heterogeneity observed in mpMRI habitats and histology.

To the best of our knowledge, this is the first study evaluating the repeatability and reproducibility of CT 3D radiomics against kernel radius and bin size in the most common tumor locations, lung and liver. A direct comparison with other precision studies is therefore limited. Previous studies reported higher proportions of repeatable features (2629), which could be attributed to differences in the number of RFs, perturbation methods (27,28), and the use of phantoms instead of real patients (29). Our analysis found texture radiomics to be more precise than first-order (histogram) features, contrasting with earlier publications (11). This discrepancy might be due to histogram features being more influenced by outliers as they rely on absolute gray-level values. The high variability of 3D RFs against kernel radius highlights the importance of caution in interpreting radiomics studies lacking detailed computation-setting information. By providing this insight, we aim to minimize randomness in the radiomics workflow of future studies. Notably, we examined precision in the two common metastatic locations, lung and liver, and found differences between them, irrespective of primary tumor. One possible explanation for this difference could be the inherent differences in contrast-to-noise ratio between lung and liver lesions in CT imaging, which is generally higher in lung lesions. This difference indicates that performance of general radiomics models heavily depends on data characteristics, limiting generalizability to other tumor locations (30,31). This finding provides a valuable foundation for future studies involving tumor habitats in heterogeneous cohorts and large-scale multicentric studies assessing cancer lesion heterogeneity.

We have demonstrated that the use of precise radiomics results in a more stable computation of CT habitats. Without a ground truth available, habitat stability was assessed on a voxel-by-voxel basis using the DSC between original-perturbed habitats. This local comparison might underestimate the global similarity between the compared habitats, potentially explaining the seemingly modest DSC values. Despite the inherent limitations of voxel-by-voxel comparisons, the large scale and design of our study provides a statistically significant and reliable answer to an ignored question: whether the preselection of RFs by repeatability and reproducibility leads to an enhanced computation of imaging habitats in lung and liver lesions.

Our case study attempted to explore the biologic relevance of CT habitats, inspired by previous studies that highlighted the value of quantitative MRI-derived habitats for characterizing tumor heterogeneity (15,16). We computed habitats independently in CT and mpMRI, observing that these imaging modalities consistently detected tumor heterogeneity (ie, two or three habitats) in most lesions, reflecting similar underlying pathologic tissue compartments. Though conclusions are difficult to draw in view of our limited sample size, CT and mpMRI habitats may be capturing biologically relevant imaging phenotypes, potentially serving as noninvasive markers of cancer aggressiveness. This conclusion underscores the potential clinical utility of our approach, still in an exploratory context.

The generalizability of our results is subject to several key limitations. First, we focused on original features without considering convolutional image filters like wavelet and Laplace of Gaussian filters. Convolutional filter-computed features have been shown to improve the predictive performance of radiomic signatures (32), yet their utility for habitat computation remains unknown, and their standardization is still being developed (33). Second, we did not evaluate the impact of semiautomatic segmentation on 3D feature precision, but we warrant its assessment in future work with multiple delineation experiments. In addition, all segmentations were performed by one radiologist, which might introduce bias since features depend on contours. Although involving multiple delineators would provide a more comprehensive view of feature robustness, this was beyond the scope of our current study. Similarly, our study did not assess reproducibility across different scanners. This aspect, while crucial, is an area more explored in precision studies; thus, it was beyond the scope of the current study. Finally, our precision analysis was limited to CT data. Future studies are needed to study in detail the stability and robustness of other imaging modalities such as MRI or PET as well as to investigate to what extent imaging phenotypes derived from different modalities overlap.

In summary, our comprehensive repeatability and reproducibility analysis identified two subsets of precise RFs for effectively computing stable CT tumor habitats in lung and liver lesions. By employing these precise RFs and using unsupervised clustering models, we demonstrated the ability to identify distinct tumor phenotypes in an exploratory analysis. CT tumor habitats correlated with biologically meaningful tumor aspects such as cellularity, vascularization, and necrosis, but further studies with larger sample sizes are needed to validate these findings. This approach to computing CT habitats holds great potential for studying intratumoral heterogeneity and monitoring cancer evolution throughout the course of the disease.

*

K.B. and R.P.L. are co–senior authors.

R.P.L. is supported by “la Caixa” Foundation, a CRIS Foundation Talent Award (TALENT19-05), the FERO Foundation, the Instituto de Salud Carlos III-Investigación en Salud (PI18/01395 and PI21/01019), the Prostate Cancer Foundation (18YOUN19), and the Asociación Española Contra el Cáncer (PRYCO211023SERR, funding C. Macarro). M.L. is supported by PERIS PIF-Salut Grant. F.G. was funded by the Government of Catalonia (Beatriu de Pinós 2020 BP 00117) and now receives the support of a fellowship From the “la Caixa” Foundation (ID 100010434; fellowship code LCF/BQ/PR22/11920010). K.B. was funded by the Government of Catalonia (Beatriu de Pinós 2019 BP 00182). O.P. was funded by the “la Caixa” Foundation (ID 100010434, code LCF/BQ/DR21/11880027). A.T.B., R.D., P.N., and R.P.L. are supported by funding From the EU Horizon 2020 Research and Innovation Programme under grant agreement number 754923 (COLOSSUS). This study received support from AstraZeneca. AstraZeneca was not involved in the study design; collection, analysis, interpretation of data; manuscript writing and decision to submit the manuscript for publication; or any other aspect concerning this work.

Data sharing: The datasets used in this study are not publicly available since patients did not sign a consent form authorizing the public release of their data, even in anonymized form.

Disclosures of conflicts of interest: O.P. Fundació “la Caixa” (ID 100010434, code LCF/BQ/DR21/11880027). C. Macarro No relevant relationships. V.N. No relevant relationships. C. Monreal No relevant relationships. M.L. No relevant relationships. A.G.R. No relevant relationships. G.S. No relevant relationships. S.S. No relevant relationships. I.B. Institutional grants/contracts with AstraZeneca, Bicycle Therapeutics, Boehringer Ingelheim, Bristol Myers Squibb, Celgene, Dragonfly, Hookipa, GlaxoSmithKline, Immutep, ISA Pharmaceuticals, Janssen, Kura, Merck Serono, Merck Sharp & Dohme (MSD), Merus, Nanobiotics, Novartis, Regeneron, Sanofi, Pharmamar, Seattle Genetics, Shattuck Labs, VCN Biosciences, and Asociación Española Contra el Cancer (AECC); consulting fees from Achilles Therapeutics, Boehringer Ingelheim, Bristol Myers Squibb, Cancer Expert Now, eTheRNA Immunotherapies, Merck Serono, Merck Sharp & Dohme (MSD), Merus, PCI Biotech, and Pierre Fabre; payment for presentation at educational event from Bristol Myers Squibb, Merck Serono, and Merck Sharp & Dohme (MSD); support for attending meeting and travel grant from Merck Sharp & Dohme (MSD), Merck Serono, and Immutep; Data Safety Monitoring Board for Incyte; leadership role with ESMO Head and Neck track, EORTC Head and Neck group, and Cancer Core Europe Clinical Taskforce; educational grant from Bristol Myers Squibb. M.V. Consulting fees from Roche and Novocure; payment/honoraria from Merck Sharp & Dohme, support for attending meetings/travel from Roche and Merck Sharp & Dohme; participation on board of BMS; GEINO group treasurer. M.E. No relevant relationships. J.C. No relevant relationships. A.T.B. No relevant relationships. R.D. Grants/contracts from Merck, Novartis, Daiichi-Sankyo, GlaxoSmithKline, and AstraZeneca; consulting fees from Roche and Boehringer Ingelheim; payment/honoraria from Roche, Boehringer Ingelheim, Ipsen, Amgen, Servier, Sanofi, Libbs, Merck Sharp & Dohme, Lilly, AstraZeneca, Janssen, Takeda, and GlaxoSmithKline; participation on a Data Safety Monitoring Board or Advisory Board for Roche; stock/stock options in Trialing SL. R.T. No relevant relationships. P.N. No relevant relationships. E.G. Grants/contracts from Novartis, Roche, Thermo Fisher, AstraZeneca, Taiho, BeiGene, and Janssen; consulting fees from Roche, Ellipses Pharma, Boehringer Ingelheim, Janssen Global Services, Seattle Genetics, Thermo Fisher, MabDiscovery, Anaveon, F-Star Therapeutics, Hengrui, Sanofi, and Incyte; payment or honoraria from Merck Sharp & Dohme, Roche, Thermo Fisher, Lilly, Novartis, and SeaGen; employment at NEXT Oncology; principal investigator or co-principal investigator for the following: Adaptimmune, Affimed, Amgen, Anaveon, AstraZeneca, Bicycletx, BioInvent International AB, Biontech SE, Biontech Small Molecules, Boehringer Ingelhem International, Catalym, Cyclacel Biopharmaceuticals, Cytovation AS, Cytomx, F.Hoffmann La Roche, F-Star Beta, Genentech, Genmab B.V., Hifibio Therapeutics, Hutchison Medipharma Limited, Icon, Imcheck Therapeutics, Immunocore, Incyte, Incyte Europe Sàrl, Janssen-Cilag International NV, Janssen-Cilag SA, Laboratorios Servier SL, Medimmune Llc, Merck & Co, Merck Kgga, Novartis Farmacéutica, Peptomyc, Pfizer Slu, Relay Therapeutics, Replimmune, Ribon Therapeutics, Ryvu Therapeutics SA, Seattle Genetics, Sotio, Sqz Biotechnologies, Symphogen A/S, Taiho Pharmaceutical USA, TKnife. F.G. Beatriu de Pinós fellowship 2020 BP 00117 funded by the Secretary of Universities and Research (Government of Catalonia); Junior Leader Fellowship LCF/BQ/PR22/11920010 from “la Caixa” Foundation (ID 100010434); 400 euros of travel support and three-night accommodation paid by the European Society for Magnetic Resonance in Medicine and Biology (ESMRMB) to attend the ESMRMB Lecture on MR “Introduction to diffusion-weighted MR imaging and spectroscopy” in Cardiff, UK, September 5-8 2023, where gave a lecture entitled “Diffusion MRI in the body” on September 7, 2023. K.B. Postdoctoral grant Beatriu de Pinos AGAUR. R.P.L. CRIS Foundation Talent Award (TALENT19-05), Instituto de Salud Carlos III-Investigacion en Salud (PI18/01395 and PI21/01019), Prostate Cancer Foundation (18YOUN19); participation on a Data Safety Monitoring Board or Advisory Board for Roche Pharma; Cancer Core Europe Imaging Task Force leader.

Abbreviations:

DSC
Dice similarity coefficient
ICC
intraclass correlation coefficient
LCL
lower 95% confidence limit
mpMRI
multiparametric MRI
RF
radiomics feature
3D
three-dimensional

References

  • 1. Fidler IJ . Tumor heterogeneity and the biology of cancer invasion and metastasis . Cancer Res 1978. ; 38 ( 9 ): 2651 – 2660 . [PubMed] [Google Scholar]
  • 2. Swanton C . Intratumor heterogeneity: evolution through space and time . Cancer Res 2012. ; 72 ( 19 ): 4875 – 4882 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Marusyk A , Janiszewska M , Polyak K . Intratumor Heterogeneity: The Rosetta Stone of Therapy Resistance . Cancer Cell 2020. ; 37 ( 4 ): 471 – 484 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. McGranahan N , Swanton C . Biological and therapeutic impact of intratumor heterogeneity in cancer evolution . Cancer Cell 2015. ; 27 ( 1 ): 15 – 26 . [Published correction appears in Cancer Cell 2015;28(1):141.] [DOI] [PubMed] [Google Scholar]
  • 5. Xu H , Lv W , Feng H , et al . Subregional Radiomics Analysis of PET/CT Imaging with Intratumor Partitioning: Application to Prognosis for Nasopharyngeal Carcinoma . Mol Imaging Biol 2020. ; 22 ( 5 ): 1414 – 1426 . [DOI] [PubMed] [Google Scholar]
  • 6. Vargas HA , Veeraraghavan H , Micco M , et al . A novel representation of inter-site tumour heterogeneity from pre-treatment computed tomography textures classifies ovarian cancers by clinical outcome . Eur Radiol 2017. ; 27 ( 9 ): 3991 – 4001 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Napel S , Mu W , Jardim-Perassi BV , Aerts HJWL , Gillies RJ . Quantitative imaging of cancer in the postgenomic era: Radio(geno)mics, deep learning, and habitats . Cancer 2018. ; 124 ( 24 ): 4633 – 4649 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Lambin P , Leijenaar RTH , Deist TM , et al . Radiomics: the bridge between medical imaging and personalized medicine . Nat Rev Clin Oncol 2017. ; 14 ( 12 ): 749 – 762 . [DOI] [PubMed] [Google Scholar]
  • 9. Kessler LG , Barnhart HX , Buckler AJ , et al . The emerging science of quantitative imaging biomarkers terminology and definitions for scientific studies and regulatory submissions . Stat Methods Med Res 2015. ; 24 ( 1 ): 9 – 26 . [DOI] [PubMed] [Google Scholar]
  • 10. Sullivan DC , Obuchowski NA , Kessler LG , et al . Metrology Standards for Quantitative Imaging Biomarkers . Radiology 2015. ; 277 ( 3 ): 813 – 825 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Traverso A , Wee L , Dekker A , Gillies R . Repeatability and Reproducibility of Radiomic Features: A Systematic Review . Int J Radiat Oncol Biol Phys 2018. ; 102 ( 4 ): 1143 – 1158 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Pfaehler E , Zhovannik I , Wei L , et al . A systematic review and quality of reporting checklist for repeatability and reproducibility of radiomic features . Phys Imaging Radiat Oncol 2021. ; 20 : 69 – 75 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Ng F , Kozarski R , Ganeshan B , Goh V . Assessment of tumor heterogeneity by CT texture analysis: can the largest cross-sectional area be used as an alternative to whole tumor analysis? Eur J Radiol 2013. ; 82 ( 2 ): 342 – 348 . [DOI] [PubMed] [Google Scholar]
  • 14. Xu L , Yang P , Yen EA , et al . A multi-organ cancer study of the classification performance using 2D and 3D image features in radiomics analysis . Phys Med Biol 2019. ; 64 ( 21 ): 215009 . [DOI] [PubMed] [Google Scholar]
  • 15. Jardim-Perassi BV , Huang S , Dominguez-Viqueira W , et al . Multiparametric MRI and Coregistered Histology Identify Tumor Habitats in Breast Cancer Mouse Models . Cancer Res 2019. ; 79 ( 15 ): 3952 – 3964 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Divine MR , Katiyar P , Kohlhofer U , Quintanilla-Martinez L , Pichler BJ , Disselhorst JA . A Population-Based Gaussian Mixture Model Incorporating 18F-FDG PET and Diffusion-Weighted MRI Quantifies Tumor Tissue Classes . J Nucl Med 2016. ; 57 ( 3 ): 473 – 479 . [DOI] [PubMed] [Google Scholar]
  • 17. Eisenhauer EA , Therasse P , Bogaerts J , et al . New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1) . Eur J Cancer 2009. ; 45 ( 2 ): 228 – 247 . [DOI] [PubMed] [Google Scholar]
  • 18. Fedorov A , Beichel R , Kalpathy-Cramer J , et al . 3D Slicer as an image computing platform for the Quantitative Imaging Network . Magn Reson Imaging 2012. ; 30 ( 9 ): 1323 – 1341 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Zwanenburg A , Leger S , Agolli L , et al . Assessing robustness of radiomic features by image perturbation . Sci Rep 2019. ; 9 ( 1 ): 614 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. van Griethuysen JJM , Fedorov A , Parmar C , et al . Computational Radiomics System to Decode the Radiographic Phenotype . Cancer Res 2017. ; 77 ( 21 ): e104 – e107 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Zwanenburg A , Vallières M , Abdalah MA , et al . The Image Biomarker Standardization Initiative: Standardized Quantitative Radiomics for High-Throughput Image-based Phenotyping . Radiology 2020. ; 295 ( 2 ): 328 – 338 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Koo TK , Li MY . A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research . J Chiropr Med 2016. ; 15 ( 2 ): 155 – 163 . [Published correction appears in J Chiropr Med 2017;16(4):346.] [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Hinkle DE , Wiersma W , Jurs SG . Applied Statistics for the Behavioral Sciences . Houghton Mifflin; , 2003. . [Google Scholar]
  • 24. Schwarz G . Estimating the Dimension of a Model . Ann Stat 1978. ; 6 ( 2 ): 461 – 464 . [Google Scholar]
  • 25. Cohen J . A power primer . Psychol Bull 1992. ; 112 ( 1 ): 155 – 159 . [DOI] [PubMed] [Google Scholar]
  • 26. Bernatowicz K , Grussu F , Ligero M , Garcia A , Delgado E , Perez-Lopez R . Robust imaging habitat computation using voxel-wise radiomics features . Sci Rep 2021. ; 11 ( 1 ): 20133 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Jha AK , Mithun S , Jaiswar V , et al . Repeatability and reproducibility study of radiomic features on a phantom and human cohort . Sci Rep 2021. ; 11 ( 1 ): 2055 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Mottola M , Ursprung S , Rundo L , et al . Reproducibility of CT-based radiomic features against image resampling and perturbations for tumour and healthy kidney in renal cancer patients . Sci Rep 2021. ; 11 ( 1 ): 11542 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Berenguer R , Pastor-Juan MDR , Canales-Vázquez J , et al . Radiomics of CT Features May Be Nonreproducible and Redundant: Influence of CT Acquisition Parameters . Radiology 2018. ; 288 ( 2 ): 407 – 415 . [DOI] [PubMed] [Google Scholar]
  • 30. van Timmeren JE , Cester D , Tanadini-Lang S , Alkadhi H , Baessler B . Radiomics in medical imaging-“how-to” guide and critical reflection . Insights Imaging 2020. ; 11 ( 1 ): 91 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Shur JD , Doran SJ , Kumar S , et al . Radiomics in Oncology: A Practical Guide . RadioGraphics 2021. ; 41 ( 6 ): 1717 – 1732 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Demircioğlu A . The effect of preprocessing filters on predictive performance in radiomics . Eur Radiol Exp 2022. ; 6 ( 1 ): 40 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. IBSI . IBSI 2 . https://theibsi.github.io/ibsi2/. Accessed January 16, 2023 .

Articles from Radiology: Artificial Intelligence are provided here courtesy of Radiological Society of North America

RESOURCES