Abstract
Background:
Results of recent phantom studies show that variation in CT acquisition parameters and reconstruction techniques may make radiomic features largely nonreproduceable and of limited use for prognostic clinical studies.
Purpose:
To investigate the effect of CT radiation dose and reconstruction settings on the reproducibility of radiomic features, as well as to identify correction factors for mitigating these sources of variability.
Materials and Methods:
This was a secondary analysis of a prospective study of metastatic liver lesions in patients who underwent staging with single-energy dual-source contrast material–enhanced staging CT between September 2011 and April 2012. Technique parameters were altered, resulting in 28 CT data sets per patient that included different dose levels, section thicknesses, kernels, and reconstruction algorithm settings. By using a training data set (n = 76), reproducible intensity, shape, and texture radiomic features (reproducibility threshold, R2 ≥ 0.95) were selected and correction factors were calculated by using a linear model to convert each radiomic feature to its estimated value in a reference technique. By using a test data set (n = 75), the reproducibility of hierarchical clustering based on 106 radiomic features measured with different CT techniques was assessed.
Results:
Data in 78 patients (mean age, 60 years ± 10; 33 women) with 151 liver lesions were included. The percentage of radiomic features deemed reproducible for any variation of the different technical parameters was 11% (12 of 106). Of all technical parameters, reconstructed section thickness had the largest impact on the reproducibility of radiomic features (12.3% [13 of 106]) if only one technical parameter was changed while all other technical parameters were kept constant. The results of the hierarchical cluster analysis showed improved clustering reproducibility when reproducible radiomic features with dedicated correction factors were used (ρ = 0.39–0.71 vs ρ = 0.14–0.47).
Conclusion:
Most radiomic features are highly affected by CT acquisition and reconstruction settings, to the point of being nonreproducible. Selecting reproducible radiomic features along with study-specific correction factors offers improved clustering reproducibility.
CT is widely used in oncology as an imaging technique for malignancy location, therapy planning, and therapy response assessment. Whereas tumors are complex biologic and physiologic structures with substantial heterogeneity (1), imaging has been limited to a few quantitative features, such as enhancement and two-dimensional size measurements for lesion characterization and response assessment, over the past decades.
With advanced computational power and improved image feature extraction, the analysis of a large number of advanced quantitative image features of tumors has become feasible, which may allow better characterization of tumor heterogeneity (2). This feature extraction and analysis—also referred to as radiomics—has shown promising preliminary results for histopathologic staging (3–5). However, the implementation of radiomics into clinical routine may be challenging.
Recent evidence suggests that caution should be used when generalizing the prognostic impact of radiomics because of the confounding effect of differences in patient populations and acquisition parameters and reconstruction techniques between institutions (2,6–9). It has been advocated that only reproducible radiomic features—those that remain the same (eg, R2 > 0.95) when acquired by using different acquisition parameters, reconstruction techniques, or different operators (eg, other institutions)–should be used for training predictive models (10). In particular, results of phantom studies show that changing the acquisition and reconstruction settings such as tube current, reconstruction kernel, and reconstruction algorithm can lower the percentage of radiomic features deemed reproducible to only 6%–43%, which may include a large portion of prognostic radiomic features (6,7,11–13). Unfortunately, our understanding of the clinical impact of such technical sources of variability in patients is limited to a small number of studies (9,14,15). Further studies involving phantoms may not provide a realistic representation of the complex morphology, architecture, and biology of a tumor in a patient, which have a direct impact on the intensity, shape, and texture radiomic features. This gap in knowledge is critical, as the development of a prognostic radiomic signature will most often involve large retrospective databases with substantial heterogeneity in image acquisition and reconstruction (eg, different reconstruction algorithms, including different radiation dose settings and reconstruction settings) (1,2). Proposed compensation methods involve restricted imaging protocols and radiomic feature transformation (6,9,16). However, clear guidance on how to account for these imaging technique variabilities is still limited.
Thus, the aim of this study was to investigate, within the same patient, the impact of CT radiation dose and reconstruction settings on the reproducibility of radiomic features, as well as to identify correction factors for mitigating these sources of variability.
Materials and Methods
This was a secondary analysis of data from a Health Insurance Portability and Accountability Act–compliant, institutional review board–approved prospective trial designed to assess the effect of iterative reconstruction and radiation dose reduction in abdominal CT. This study was approved by the institutional review board of Duke University, and a waiver of written informed consent was obtained. One author of the study (J.C.R.) is an employee of Siemens Healthineers, two authors (D.M. and M.M.) received research support (provision of software tools used in this study) from Siemens Healthineers, and another author (B.P.) receives research support and speaker’s fees from GE Healthcare and research support from Siemens Healthineers (not related to the study). All other authors are not employees of or consultants for the industry and had control of any data or information that might present a conflict of interest.
Study Participants
Study participants were eligible for inclusion between September 2011 and April 2012 if (a) they underwent a single-energy dual-source contrast material–enhanced staging CT examination during the portal venous phase and (b) were known to have or suspected of having liver metastases from colon cancer on the basis of the results of previous multidetector CT or US findings or increased carcinoembryonic antigen tumor marker levels (>5 ng/mL [>5 μg/L]). Study participants were excluded if (a) the study participant’s total body weight was greater than 118 kg (260 lbs) (our departmental cutoff for performing a dual-energy CT examination), (b) metal artifacts from the spine or abdominal clips or stents affected the liver parenchyma, and/or (c) image quality was nondiagnostic because of increased noise texture and contrast medium administration timing. A total of 78 patients with 151 liver lesions fulfilled our inclusion criteria (Fig E1 [online]). A detailed description of the data acquisition protocol and the reference standard for a liver lesion to be included can be found in Appendix E1 (online).
Data Reconstruction
The raw projection data were anonymized and exported from the scanner console onto an external hard drive and transferred to an offline reconstruction tool (ReconCT; Siemens Healthineers, Forchheim, Germany). All images were reconstructed by using an image matrix of 512 × 512 pixels. To allow us to assess the effect of imaging technique variability, dose levels, reconstruction section thicknesses, reconstruction kernels, and reconstruction algorithms, settings were altered as described below by using the most clinically relevant settings.
Seven dose levels (at 100%, 87.5%, 75%, 67.5%, 50%, 37.5%, and 25% of the clinical levels) were reconstructed by using a dedicated filtered back-projection (FBP) soft-tissue kernel (B31f) and a section thickness of 5.0 mm (increment, 4.0 mm), with the 100% dose level regarded as the reference level. More details on dose level calculations can be found in Appendix E1 (online).
The following three section thicknesses with an 80% increment were chosen: 1.0 mm (increment, 0.8 mm), 3.0 mm (increment, 2.4 mm), and 5.0 mm (increment, 4.0 mm) with a dedicated FBP soft-tissue kernel (B31f ), with the 3.0-mm thickness (increment, 2.4 mm) regarded as the reference thickness.
Thirteen reconstruction kernels where chosen, including 10 dedicated soft-tissue kernels (B20f, B30f, B31f, B35f, B40f, B41f, Br36f, Br40f, Bf39f, and Bf42f) and three quantitative tissue kernels (D30f, Qr36f, and Qr40f). Each image data set was reconstructed with a section thickness of 3.0 mm (increment, 2.4 mm) by using an FBP algorithm, with the B31f kernel regarded as the reference reconstruction setting.
Two reconstruction algorithms were chosen: FBP and a second-generation iterative reconstruction algorithm (sinogram-affirmed iterative reconstruction, or SAFIRE; Siemens Healthineers) with dedicated soft-tissue kernels (B30f and I30f, reconstructing a strength one through five). Again, each image data set was reconstructed with a section thickness of 3.0 mm (increment, 2.4 mm), and FBP was regarded as the reference reconstruction algorithm.
In total, this resulted in 28 image data sets per patient (seven radiation dose levels plus three section thicknesses plus 13 reconstruction kernels plus five reconstruction algorithms) (Fig 1).
Segmentation and Radiomic Feature Extraction
All patient identifiers (including name, age, sex, and medical record number) were removed from the images. Volumetric liver tumor segmentation was performed by two radiologists (D.M., with 10 years of experience in liver CT, and M.M., with 6 years of experience) in consensus using a semiautomatic segmentation and radiomic feature extraction tool (Radiomics, version 1.0.9; Siemens Healthineers). In cases of disagreement between the two readers, the segmentation of the more experienced reader was chosen. The segmentation was performed on the thin-section FBP reconstruction image data sets (B31; section thickness, 1.0 mm with an increment of 0.8 mm). To mitigate the effect of multiple observations within the same patient, the two most heterogeneous lesions larger than 20 mm were considered in individuals with multiple lesions of similar or identical clinical relevance. The 20-mm threshold was chosen to ensure that enough voxels were included to allow calculation of the radiomic features. A detailed description of the radiomic feature extraction can be found in Appendix E1 (online).
Statistical Analysis
All statistical analyses were performed by using R (R 3.5.1; R Foundation for Statistical Computing, Vienna, Austria) (17). Lesions were separated into a training data set (n = 76) and a validation data set (n = 75) with no more than one lesion per patient in the training or validation data sets. To identify reproducible radiomic features, which were measured consistently between different CT techniques and the reference technique, linear models were fitted on the training data set as follows:
where , is the value of radiomic feature for patient measured with the reference CT technique, is the value with technique , and and are shift and scale correction factors, respectively. Reproducible features were defined as features for which explained at least 95% of the variance in (ie, the from the linear model was > 0.95).
Using the validation data set, we determined whether the same hierarchical clustering of lesions based on radiomic features could be obtained when all patients in the validation data sets were scanned by using the reference CT technique as when half of patients in the validation data set were scanned by using the reference technique and half of the patients in the validation data set were scanned by using an alternative technique.
Hierarchical clustering (a form of cluster analysis) is a commonly used statistical method for grouping items such as liver lesions on the basis of multiple measurements (eg, radiomic features) in a way such that liver lesions in the same cluster have more similar radiomic features than liver lesions in other clusters. The goal of hierarchical clustering is to reveal biologically meaningful patterns in large data sets. The reproducibility of hierarchical clustering provides a useful summary statistic of the reproducibility of the underlying radiomic features, both because hierarchical clustering depends on the correlations of all the underlying radiomic features and because the clustering itself is frequently the end goal in many areas of data science. To assess the reproducibility of hierarchical clustering, we computed the nonparametric correlation between two hierarchical clusters, denoted as (18), where indicates no correlation and indicates perfect correlation. The degree of correlation between two hierarchical clusters that would be expected by chance was assessed with permutation testing, and values for were determined on the basis of the number of permuted data sets with correlation greater than . Further details are given in Appendix E1 (online).
Results
A total of 78 patients with 151 metastatic liver lesions (mean age, 60 years ± 10 [standard deviation]) (33 women with a mean age of 61 years ± 10 [age range, 28–74 years] and 45 men with mean age of 60 years ± 10 [age range, 34–81 years]) were included. The mean patient effective diameter, calculated as the square root of the anteroposterior diameter multiplied by the transverse diameter, was 28.4 cm ± 3.1 (range, 22.0–35.1 cm), and the size-specific dose estimate was 13.1 mGy ± 4.5 (range, 6.7–28.6 mGy) (Table 1).
Table 1:
Parameter | Datum |
---|---|
| |
Age (y) | |
Overall | 60 ± 10 |
Male patients | 61 (28–74) |
Female patients | 60 (34–81) |
No. of male patients* | 45 (58.7) |
Effective diameter (cm) | 28.4 ± 3.1 |
SSDE unenhanced CT scan (mGy) | 13.1 ± 4.5 |
Liver lesions (n = 151) | |
Lesion size (mm) | 36.9 ± 13.4 |
Lesion attenuation (HU) | 99.2 ± 44.1 |
Lesion volume (mL) | 20.4 ± 16.2 |
Note.—Unless otherwise specified, data are means ± standard deviations, with ranges in parentheses. SSDE = size-specific dose estimate.
Data in parentheses are the percentage.
The percentage of radiomic features deemed reproducible (ie, values with R2 ≥ 0.95) for any variation of the different technical parameters (ie, dose level, reconstructed section thickness, reconstruction kernel and algorithm) was 11.3% (12 of 106). Notably, these included the maximum axial diameter and volume.
Radiomic features in the shape category were the least likely to be affected by variability due to changes in one single technical parameter (while all other technical parameters were kept constant), with an average percentage of reproducible radiomic features of 87.5% (14 of 16) (Fig 2). Radiomic features in the intensity and texture categories showed higher variability with changes in one single technical parameter, yielding average percentages of reproducible radiomic features of 41.2% (seven of 17) and 17.8% (13 of 73), respectively (Fig 3). Table 2 shows the number of reproducible radiomic features for changes in each technical parameter separately while all other technical parameters were kept constant. Results of the linear model with the representative correction factors shift and scale are summarized in Table E1 (online).
Table 2:
Feature | All Radiomic Features (n = 106) | Intensity Radiomic Features (n = 17) | Shape Radiomic Features (n = 16) | Texture Radiomic Features (n = 73) | Mean Attenuation* | Maximum Axial Diameter* | Volume* |
---|---|---|---|---|---|---|---|
| |||||||
Radiation dose level | 22 (20.8) | 2 (11.8) | 16 (100) | 4 (5.5) | 0 | 1 (100) | 1 (100) |
Reconstruction algorithm | 42 (39.6) | 9 (52.9) | 16 (100) | 17 (23.3) | 1 (100) | 1 (100) | 1 (100) |
Reconstruction kernel | 56 (52.8) | 15 (88.2) | 16 (100) | 25 (34.2) | 1 (100) | 1 (100) | 1 (100) |
Section thickness | 13 (12.3) | 1 (5.9) | 8 (50) | 4 (5.5) | 0 | 1 (100) | 1 (100) |
Note.—Categorical variables are summarized as frequencies and percentages (in parentheses) from total of radiomic features. Reproducible radiomic features were defined as those with R2 > 0.95.
Parameters used in current response assessment criteria.
Of all technical parameters, reconstructed section thickness had the largest impact on the reproducibility of radiomic features, yielding the lowest percentage of reproducible radiomic features (12.3% [13 of 106]) (Fig 4). This included only 50.0% reproducibility of radiomic features in the shape category (eight of 16), 5.9% in the intensity category (one of 17), and 5.5% in the texture category (four of 73). This is in contrast with the impact of the reconstruction kernel, which had the lowest impact on reproducibility (52.8% [56 of 106] remained reproducible) (Fig 4). This included 100% reproducibility of radiomic features in the shape category (16 of 16), 88.2% in the intensity category (15 of 17), and 34.2% in the texture category (25 of 73).
The results of the hierarchical cluster analysis, which enables the classification of liver lesions with similar radiomic features, showed improved clustering reproducibility when reproducible radiomic features were used compared with all radiomic features (ρ = 0.14–0.47 vs ρ = 0.26–0.64). As illustrated in Figures E2 and E3 (both online), the reproducibility of lesion clustering was further improved with the implementation of dedicated correction factors to counteract the impact of technical sources of variability (ρ = 0.39–0.71). Table 3 presents more details on the clustering reproducibility of radiomic features for changes in each technical parameter.
Table 3:
Parameter | All Radiomic Features | Reproducible Radiomic Features | Reproducible and Corrected Radiomic Features | P Value for Random Features vs Reproducible Features | P Value for Corrected Random Features vs Reproducible Corrected Features |
---|---|---|---|---|---|
| |||||
Section thickness (reference value, 3.0 mm) | |||||
1.0 mm | 0.14 (0.07, 0.2) | 0.43 (0.3, 0.56) | 0.59 (0.45, 0.69) | <.001 | <.001 |
5.0 mm | 0.14 (0.07, 0.22) | 0.36 (0.26, 0.5) | 0.71 (0.57, 0.83) | .001 | <.001 |
Dose level (reference value, 100%) | |||||
Dose level 25% | 0.18 (0.11, 0.29) | 0.32 (0.21, 0.47) | 0.55 (0.41, 0.72) | .02 | <.001 |
Dose level 37% | 0.19 (0.11, 0.28) | 0.36 (0.26, 0.47) | 0.47 (0.32, 0.62) | .006 | .001 |
Dose level 50% | 0.2 (0.11, 0.3) | 0.4 (0.26, 0.59) | 0.51 (0.38, 0.64) | .01 | <.001 |
Dose level 62% | 0.27 (0.19, 0.39) | 0.35 (0.2, 0.49) | 0.48 (0.34, 0.66) | .14 | .02 |
Dose level 75% | 0.37 (0.25, 0.49) | 0.33 (0.2, 0.49) | 0.39 (0.25, 0.55) | .59 | .06 |
Dose level 87% | 0.47 (0.34, 0.6) | 0.46 (0.32, 0.59) | 0.51 (0.38, 0.66) | .45 | .007 |
Algorithm (reference, FBP) | |||||
SAFIRE strength 1 | 0.28 (0.22, 0.38) | 0.28 (0.18, 0.36) | 0.51 (0.42, 0.6) | .46 | .154 |
SAFIRE strength 2 | 0.18 (0.1, 0.27) | 0.26 (0.14, 0.4) | 0.57 (0.43, 0.73) | .13 | .009 |
SAFIRE strength 3 | 0.15 (0.09, 0.23) | 0.31 (0.2, 0.41) | 0.5 (0.39, 0.66) | .02 | .02 |
SAFIRE strength 4 | 0.14 (0.07, 0.21) | 0.29 (0.18, 0.41) | 0.42 (0.32, 0.53) | .02 | .01 |
SAFIRE strength 5 | 0.13 (0.07, 0.2) | 0.32 (0.21, 0.44) | 0.46 (0.34, 0.62) | .003 | <.001 |
Kernel (reference, B30) | |||||
B20f | 0.15 (0.09, 0.22) | 0.32 (0.24, 0.44) | 0.44 (0.34, 0.55) | .01 | .023 |
B35f | 0.19 (0.12, 0.24) | 0.4 (0.26, 0.53) | 0.59 (0.47, 0.7) | .004 | .001 |
B40f | 0.27 (0.16, 0.36) | 0.55 (0.4, 0.72) | 0.67 (0.53, 0.78) | .006 | .001 |
B41f | 0.32 (0.23, 0.43) | 0.58 (0.47, 0.7) | 0.58 (0.47, 0.74) | .004 | .003 |
D30f | 0.3 (0.2, 0.41) | 0.54 (0.42, 0.68) | 0.55 (0.45, 0.69) | .001 | .004 |
Br36f | 0.17 (0.1, 0.25) | 0.36 (0.26, 0.47) | 0.59 (0.48, 0.7) | .008 | <.001 |
Br40f | 0.26 (0.16, 0.38) | 0.54 (0.37, 0.72) | 0.68 (0.54, 0.81) | .01 | <.001 |
Bf39f | 0.33 (0.23, 0.43) | 0.58 (0.47, 0.7) | 0.57 (0.47, 0.74) | .002 | .005 |
Bf42f | 0.19 (0.11, 0.28) | 0.44 (0.3, 0.6) | 0.60 (0.48, 0.75) | .006 | <.001 |
Qr36f | 0.17 (0.09, 0.26) | 0.36 (0.24, 0.47) | 0.59 (0.47, 0.69) | .006 | <.001 |
Qr40f | 0.31 (0.22, 0.42) | 0.64 (0.49, 0.8) | 0.67 (0.5, 0.82) | <.001 | <.001 |
Note.—Data are nonparametric correlation values (ρ values [range, 0–1, with 1 representing perfect agreement]) between two hierarchical clusters based on radiomic features for which half of patients were scanned by using the reference CT technique and half were scanned by using an alternative technique. FBP = filtered back projection, SAFIRE = sinogram-affirmed iterative reconstruction. Data in parentheses are 95% confidence intervals; P < .05 was considered to indicate a significant difference.
Discussion
The generalization of radiomic features has been questioned because of the confounding effect of differences in acquisition parameters and reconstruction techniques of large databases from single or multicenter studies. The aim of this study was to investigate, within the same patient, the impact of CT radiation dose and reconstruction settings on the reproducibility of radiomic features. Our study demonstrated a low percentage of reproducible liver tumor radiomic features when changes in CT technical parameters were made. Specifically, only 11.3% (12 of 106) of the radiomic features tested were deemed reproducible for all tested radiation dose and reconstruction CT settings. Reconstructed section thickness had the largest impact, with only 12.3% (13 of 106) of features showing reproducibility if only one technical parameter was altered. Hierarchical clustering, a crucial task in radiomic studies to identify meaningful patterns, showed an improved clustering reproducibility when reproducible radiomic features with dedicated correction factors were used instead of all radiomic features (P < .02).
Our data are in agreement with prior observations, which showed that only 6%–43% of radiomic features were reproducible in phantoms and in limited patient cohorts (6,7,11,12). Not all radiomic features were equally influenced by changes in CT technical parameters. For example, we found that radiomic features in the shape category (including the maximum axial diameter and volume) were insensitive to changes in radiation dose and reconstruction CT settings, compared with radiomic features in the texture and, to a lesser extent, the intensity categories (including mean attenuation). This has direct impact on current response assessment criteria focused on attenuation changes, such as the Choi criteria or the vital tumor burden (19,20), as well as on new approaches taking radiomic features into account. According to a recent study in patients undergoing immunotherapy (4), radiomic features in the intensity and texture categories showed promising results in discriminating inflamed tumors from immune-desert tumors, an important distinction for predicting response. On the other hand, radiomic features in the shape category may also provide important clinical information, such as improved prediction of survival in patients with lung cancer and head and neck cancer (5).
In our study, the reconstructed section thickness demonstrated the largest impact on the reproducibility of radiomic features. Our results seem to conflict with those of prior investigations, which showed a stronger effect of the reconstruction kernel on the reproducibility of radiomic features (6,8). Several factors may have contributed to these differences, including the use of phantoms in previous studies compared with our patient cohort, differences in CT scanner platform and scan collimation, and the inclusion of both soft-tissue and non–soft-tissue (eg, bone and lung) reconstruction kernels in prior studies. We deliberately limited our analysis to soft-tissue kernels only, as this reflects current clinical practice during the evaluation of metastatic liver lesions. Of note, a recent phantom study (6) showed results comparable to our study for the impact of different soft-tissue kernels on the reproducibility of radiomic features.
Despite several efforts made toward the standardization of radiomic features—including the Image Biomarker Standardisation Initiative for standardizing the nomenclature and computation of radiomic features (10,21)—our results demonstrated a large variability in most radiomic features with changes in CT technical parameters. These data emphasize the need for additional compensation methods to further improve the reproducibility of radiomic features. Our study suggests that using selected reproducible radiomic features, alone or in combination with empirically estimated correction factors, may mitigate the confounding effect of radiation dose and reconstruction CT settings. For a global measure of reproducibility of the complete set of radiomic features with different CT techniques, we focused on the correlation of hierarchical clustering obtained by using data from an ideal patient population with the same reference CT technique in each patient versus if half the patients were scanned using the reference CT technique and half the patients were scanned using an alternative CT technique. Hierarchical clustering is a widely used method in “-omics” fields for grouping high-dimensional data according to similarity (22,23) and is thought to aid in uncovering biologically meaningful patterns in the data. The ability to perform hierarchical clustering based on radiomic features in a manner that is robust to different CT technical parameters represents an important requisite for extracting biologically relevant information from these highly dimensional data sets.
Our data may have important clinical implications. In conjunction with alternative and complementary strategies, such as predefined ranges of scan and reconstruction settings or advanced transformation algorithms (6,9), the proposed method for compensation of technical sources of variability on the reproducibility of radiomic features may facilitate the identification of explicit links between quantitative imaging data and molecular tumor signatures, using large retrospective and multicenter databases. This may represent a critical step toward the development and implementation of personalized adaptive therapies.
In addition to the small number of patients and retrospective single-center design, some limitations of our study merit consideration. First, our study did not include additional important sources of technical variability, such as different CT vendors and scanning techniques (eg, pitch, kilovoltage, or milliampere settings) or contrast material injection–related variables (eg, contrast material volume or injection rate). However, to our knowledge, this is the first study comparing, within the same patient, the effect of different radiation dose levels and different reconstruction settings on the reproducibility of radiomic features. Second, we did not evaluate the impact of operator-dependent sources of variability, such as the effect of intra- and interrater variability or the lesion segmentation algorithm (24). Third, we assessed only a smaller subset of the large number of radiomic features (>1000) that can be calculated by using different transformations, such as wavelet or Laplacian (25). Although they are deemed as separate radiomic features after these transformations, these features naturally will correlate highly with the original basic radiomic features from which they were transformed. Because our goal was to assess the reproducibility of radiomic features in general, we concentrated on these basic features rather than on the high number of radiomic features after different transformations. Fourth, a training data set was necessary to estimate reproducible radiomic features and correction factors, which may not always be feasible in large databases. Fifth, we did not assess the interobserver variability of radiomic factors, which is known to have an additional impact on reproducibility of radiomic features (26). Finally, although we did show that hierarchical clustering, a fundamental task in “-omics” sciences, can be performed more reliably across CT techniques by using reproducible radiomic features with correction factors, we did not prove that this improves the ability to extract biologically meaningful information from radiomic features.
In conclusion, our study corroborates the accumulating evidence on the limited reproducibility of radiomic features due to changes in radiation dose and reconstruction CT settings. By selecting a smaller subset of more reproducible radiomic features along with the use of study-specific correction factors, we found a significant improvement in the clustering reproducibility of radiomic features for metastatic liver lesions. Future studies are needed to determine the potential clinical benefit of radiomic features, including large longitudinal single-center or multicenter study designs with heterogeneous image acquisition and reconstruction parameters, and our described method may help to minimize the effect of technical variability.
Supplementary Material
Summary
Many radiomic features are nonreproducible because of variation in CT radiation dose and reconstruction settings, but reproducibility can be improved by selecting those radiomic features that are reproducible along with dedicated correction factors.
Key Points
Of 106 radiomic features, the percentage deemed reproducible for any variation of the CT technical parameters was only 11%.
Of all technical parameters, reconstructed section thickness had the largest impact and reconstruction kernel the smallest impact on the percentage of radiomic features deemed reproducible (12% and 53%, respectively) when only one single technical parameter was changed while all other technical parameters were kept constant.
Hierarchical cluster analysis showed improved clustering reproducibility when using reproducible radiomic features without (ρ = 0.26–0.64 vs ρ = 0.14–0.47) and with (ρ = 0.39–0.71 vs ρ = 0.14–0.47) dedicated correction factors.
Acknowledgments
Disclosures of Conflicts of Interest: M.M. Activities related to the present article: institution received research support from Siemens Healthineers (provision of software tools) without the payment of any money. Activities not related to the present article: has received speakers fees from Siemens Healthineers for clinical lectures on dual-energy CT. Other relationships: disclosed no relevant relationships. J.R. disclosed no relevant relationships. F.V. disclosed no relevant relationships. R.C.N. Activities related to the present article: disclosed no relevant relationships. Activities not related to the present article: was a consultant for GE Healthcare until December 2017. Other relationships: disclosed no relevant relationships. J.C.R. Activities related to the present article: is an employee of Siemens Healthineers. Activities not related to the present article: disclosed no relevant relationships. Other relationships: disclosed no relevant relationships. J.S. Activities related to the present article: disclosed no relevant relationships. Activities not related to the present article: receives small royalty payments (<$2000) from Sun Nuclear Gammex for the licensing of a phantom design unrelated to this study. Other relationships: disclosed no relevant relationships. B.N.P. Activities related to the present article: disclosed no relevant relationships. Activities not related to the present article: institution has received grants from GE Healthcare and Siemens Healthineers; is on the speakers bureau of GE Healthcare. Other relationships: disclosed no relevant relationships. E.S. Activities related to the present article: disclosed no relevant relationships. Activities not related to the present article: is a consultant for Imalogix; has given expert testimony for Rubin Anders; institution has grants or grants pending with Siemens Healthineers and Bracco. Other relationships: institution receives royalties for intellectual property from GE Healthcare, Imalogix, 12Sigma, and SunNuclear. D.M. Activities related to the present article: institution has received research fellow support from Siemens Healthineers. Activities not related to the present article: disclosed no relevant relationships. Other relationships: disclosed no relevant relationships.
Abbreviation
- FBP
filtered back projection
Footnotes
Conflicts of interest are listed at the end of this article.
References
- 1.Lubner MG, Smith AD, Sandrasegaran K, Sahani DV, Pickhardt PJ. CT Texture Analysis: Definitions, Applications, Biologic Correlates, and Challenges. RadioGraphics 2017;37(5):1483–1503. [DOI] [PubMed] [Google Scholar]
- 2.Gillies RJ, Kinahan PE, Hricak H. Radiomics: Images Are More than Pictures, They Are Data. Radiology 2016;278(2):563–577. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Huang YQ, Liang CH, He L, et al. Development and Validation of a Radiomics Nomogram for Preoperative Prediction of Lymph Node Metastasis in Colorectal Cancer. J Clin Oncol 2016;34(18):2157–2164. [DOI] [PubMed] [Google Scholar]
- 4.Sun R, Limkin EJ, Vakalopoulou M, et al. A radiomics approach to assess tumour-infiltrating CD8 cells and response to anti-PD-1 or anti-PD-L1 immunotherapy: an imaging biomarker, retrospective multicohort study. Lancet Oncol 2018;19(9):1180–1191. [DOI] [PubMed] [Google Scholar]
- 5.Aerts HJ, Velazquez ER, Leijenaar RT, et al. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat Commun 2014;5(1):4006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Berenguer R, Pastor-Juan MDR, Canales-Vázquez J, et al. Radiomics of CT Features May Be Nonreproducible and Redundant: Influence of CT Acquisition Parameters. Radiology 2018;288(2):407–415. [DOI] [PubMed] [Google Scholar]
- 7.Kim H, Park CM, Lee M, et al. Impact of Reconstruction Algorithms on CT Radiomic Features of Pulmonary Tumors: Analysis of Intra- and Inter-Reader Variability and Inter-Reconstruction Algorithm Variability. PLoS One 2016;11(10):e0164924. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Lu L, Ehmke RC, Schwartz LH, Zhao B. Assessing Agreement between Radiomic Features Computed for Multiple CT Imaging Settings. PLoS One 2016;11(12):e0166550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Orlhac F, Frouin F, Nioche C, Ayache N, Buvat I. Validation of A Method to Compensate Multicenter Effects Affecting CT Radiomics. Radiology 2019;291(1):53–59. [DOI] [PubMed] [Google Scholar]
- 10.O’Connor JP, Aboagye EO, Adams JE, et al. Imaging biomarker roadmap for cancer studies. Nat Rev Clin Oncol 2017;14(3):169–186. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Mackin D, Fave X, Zhang L, et al. Measuring Computed Tomography Scanner Variability of Radiomics Features. Invest Radiol 2015;50(11):757–765. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Shafiq-Ul-Hassan M, Zhang GG, Latifi K, et al. Intrinsic dependencies of CT radiomic features on voxel size and number of gray levels. Med Phys 2017;44(3):1050–1062. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Robins M, Solomon J, Hoye J, Abadi E, Marin D, Samei E. How reliable are texture measurements? In: Lo JY, Gilat Schmidt T, Chen GH, eds. Proceedings of SPIE: medical imaging 2018—physics of medical imaging. Vol 10573. Bellingham, Wash: International Society for Optics and Photonics, 2018; 105733W. [Google Scholar]
- 14.Midya A, Chakraborty J, Gönen M, Do RKG, Simpson AL. Influence of CT acquisition and reconstruction parameters on radiomic feature reproducibility. J Med Imaging (Bellingham) 2018;5(1):011020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Solomon J, Mileto A, Nelson RC, Roy Choudhury K, Samei E. Quantitative Features of Liver Lesions, Lung Nodules, and Renal Stones at Multi-Detector Row CT Examinations: Dependency on Radiation Dose and Reconstruction Algorithm. Radiology 2016;279(1):185–194. [DOI] [PubMed] [Google Scholar]
- 16.Sollini M, Cozzi L, Antunovic L, Chiti A, Kirienko M. PET Radiomics in NSCLC: state of the art and a proposal for harmonization of methodology. Sci Rep 2017;7(1):358. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.R Development Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2010. [Google Scholar]
- 18.Galili T. dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering. Bioinformatics 2015;31(22):3718–3720. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Choi H, Charnsangavej C, Faria SC, et al. Correlation of computed tomography and positron emission tomography in patients with metastatic gastrointestinal stromal tumor treated at a single institution with imatinib mesylate: proposal of new computed tomography response criteria. J Clin Oncol 2007;25(13):1753–1759. [DOI] [PubMed] [Google Scholar]
- 20.Smith AD, Zhang X, Bryan J, et al. Vascular Tumor Burden as a New Quantitative CT Biomarker for Predicting Metastatic RCC Response to Antiangiogenic Therapy. Radiology 2016;281(2):484–498. [DOI] [PubMed] [Google Scholar]
- 21.Hatt M, Vallieres M, Visvikis D, Zwanenburg A. IBSI: an international community radiomics standardization initiative. J Nucl Med 2018;59(Suppl 1):287. http://jnm.snmjournals.org/content/59/supplement_1/287. [Google Scholar]
- 22.Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 1998;95(25):14863–14868. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Perou CM, Sørlie T, Eisen MB, et al. Molecular portraits of human breast tumours. Nature 2000;406(6797):747–752. [DOI] [PubMed] [Google Scholar]
- 24.Joskowicz L, Cohen D, Caplan N, Sosna J. Inter-observer variability of manual contour delineation of structures in CT. Eur Radiol 2019;29(3):1391–1399. [DOI] [PubMed] [Google Scholar]
- 25.Larue RTHM, Van De Voorde L, van Timmeren JE, et al. 4DCT imaging to assess radiomics feature stability: An investigation for thoracic cancers. Radiother Oncol 2017; 125(1):147–153. [DOI] [PubMed] [Google Scholar]
- 26.Pavic M, Bogowicz M, Würms X, et al. Influence of inter-observer delineation variability on radiomics stability in different tumor sites. Acta Oncol 2018;57(8):1070–1074. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.