Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2022 Mar 18;12:4732. doi: 10.1038/s41598-022-08301-1

Assessing radiomics feature stability with simulated CT acquisitions

Kyriakos Flouris 1,, Oscar Jimenez-del-Toro 2, Christoph Aberle 3, Michael Bach 3, Roger Schaer 2, Markus M Obmann 3, Bram Stieltjes 3, Henning Müller 2,4, Adrien Depeursinge 2,5, Ender Konukoglu 1
PMCID: PMC8933485  PMID: 35304508

Abstract

Medical imaging quantitative features had once disputable usefulness in clinical studies. Nowadays, advancements in analysis techniques, for instance through machine learning, have enabled quantitative features to be progressively useful in diagnosis and research. Tissue characterisation is improved via the “radiomics” features, whose extraction can be automated. Despite the advances, stability of quantitative features remains an important open problem. As features can be highly sensitive to variations of acquisition details, it is not trivial to quantify stability and efficiently select stable features. In this work, we develop and validate a Computed Tomography (CT) simulator environment based on the publicly available ASTRA toolbox (www.astra-toolbox.com). We show that the variability, stability and discriminative power of the radiomics features extracted from the virtual phantom images generated by the simulator are similar to those observed in a tandem phantom study. Additionally, we show that the variability is matched between a multi-center phantom study and simulated results. Consequently, we demonstrate that the simulator can be utilised to assess radiomics features’ stability and discriminative power.

Subject terms: Cancer imaging, Computational science, Data processing, Image processing, Machine learning, Software

Introduction

Computerized quantitative analysis of medical images is emerging as a promising approach in radiological practice and healthcare research14. These methods extract measurements quantifying various aspects of the image that include basic intensity statistics as well as more complicated metrics quantifying spatial intensity heterogeneity. Extracted measurements are then used as image biomarkers in predicting relevant outcomes. In recent years, numerous researchers demonstrated the capability of this approach for diagnosis, stratification, and prognosis5,6. Moreover, since the extraction of measurements as well as the prediction stage are all algorithmic, quantitative analysis is an efficient approach that can complement radiologists’ visual interpretation and analysis.

Advanced artificial intelligence techniques7, such as deep learning, take the quantitative analysis approach one step further8,9. They remove the need to engineer measurements to extract from images for a given task. Instead, they optimize their parameters to extract task-optimal measurements and predict based on them. In the respective language, quantitative measurements are called “features”. While the optimization requires large number of data samples, i.e., training samples, if such large datasets exist, deep learning algorithms can provide substantial accuracy gains10.

An important limitation of the quantitative analysis approach is its sensitivity to variations in scanning conditions11. While the methods aim to extract measurements characterising the underlying tissue composition and microstructure, they are indeed measurements taken from the image, which is merely a representation of the tissue. Critically, image characteristics heavily rely on the acquisition details, e.g., resolution, radiation dose, noise, reconstruction algorithm. Depending on the properties of the algorithm and the measurement, the extracted quantities can be highly sensitive to variations in the image acquisition parameters1214. This sensitivity inhibits the generalisation capabilities of such measurements. If acquisition details are not perfectly matched, two different images, even of the same tissue, will yield different measurements. A number of studies have reported the impact on CT radiomics analysis caused by the variability of acquisition parameters and post-process variables1518. Any algorithm or analysis based on these measurements will therefore not be reliable for use with unseen scanners.

The ideal way to study the sensitivity of measurements is to perform test–retest studies19. This would comprise of imaging a group of subjects imaged under different acquisition details. To study sensitivity of a measurement, values extracted from corresponding images would be compared. When new measurements or new algorithms to extract measurements are proposed, they would be studied the same way. As this is not feasible for various imaging modalities, such as Computed Tomography (CT) due to the radiation exposure of patients in these studies, anthropomorphic printed phantoms have been proposed for CT variability studies2022.

Phantom studies have been successfully used for various imaging modalities. Especially for CT, advances in 3D printing technologies allow printing volumetric patient images using materials with attenuation properties comparable to human tissue. Recent work reported variability studies using such phantoms2325.

While phantoms make it possible to study variability without imaging cohorts, they still require acquiring and imaging phantoms. This can be costly as well as resource and time consuming. In this work, we study whether sensitivity analysis using advanced in silico CT simulators can yield similar results to real phantom imaging studies. To this end, a CT-scan simulator environment was set up using the publicly available26,27 ASTRA toolbox (www.astra-toolbox.com). Using a high-dose CT-image as input, the simulator outputs raw projections, which can be manipulated accordingly. For example, stationary and uncorrelated noise can be added. Additionally, the simulator allows for some freedom in geometrical parameters such as the number of projections, slice thickness, and distances. The CT-image can be reconstructed with a variety of algorithm choices, e.g. filtered back-projection and simultaneous iterative reconstruction technique.

The method is compared with an empirical anthropomorphic phantom variability study published in Ref.23. In this unique setup, the simulated phantom study is performed using the same original image from which the anthropomorphic phantom was printed and the study in Ref.23 conducted. In a sense, this can be viewed as the theoretical replication. The simulator environment was implemented to reconstruct images at different noise levels, reconstruction algorithms, and number of projections. To mimic repetition and introduce variability, each simulation parameter set was repeated via a variation of the Poisson noise random seed. For the simulated images, radiomics features were extracted and analysed. As the same source image is used for both the empirical phantom study and this work, direct comparison of the results of sensitivity analyses is possible.

The next section describes the CT simulator environment method including a brief introduction of the anthropomorphic phantom and the phantom study. In the “Results and discussion” section a comprehensive validation and comparison of the simulator with respect to the phantom study is presented. Furthermore, a stability and discriminative power analysis and discussions can be found in the same section. The paper is summarised in the “Conclusions” section.

Methods

First, we introduce the details of the novel anthropomorphic phantom created for the tandem phantom study23. A high dose CT-scan of this phantom is used as the simulator input. Second, the extracted radiomics features, the principal component analysis and the simulator environment are described in detail.

Anthropomorphic phantom and phantom CT acquisitions

Here, we provide brief details of the anthropomorphic phantom study presented in Ref.23 for completeness. For further details, we refer the reader to the original publication.

A realistic radio-opaque three-dimensional phantom was designed from real patient CT data. Namely, the compilation of a half-mirrored lung including a tumor and an abdominal liver section with a metastasis from a colon carcinoma23. The phantom was manufactured via stacking sheets of printed aqueous potassium iodide solution on paper28. The lung tumor section is a replication of a publicly available patient data set for radiomics phantoms, from the Image Biomarker Standardization Initiative29. The lung section was neither used in this work nor the tandem phantom CT study. Tissue equivalent attenuation at a defined energy spectrum was calibrated at 120 kVp. The contrast resolution of the printing technique in the phantom goes from − 100 to 1000 Hounsfield units (HU). Overall, no structures can be represented whose HU is below this paper-induced threshold. To test the contrast resolution, a circular intensity ramp was printed in the phantom running through an HU range of 0 to 1000. A reliable resolution of 2 HU difference was achieved. Consequently the abdominal region was adequately depicted for a quantitative analysis within the printed HU range.

The phantom was imaged with a Siemens SOMATOM Definition Edge CT scanner (SSDE). To define the acquisition and image reconstruction parameters, a survey of clinical CT protocols was performed including 9 radiological institutes. All the CT scans in that study were acquired with the same acquisition parameters, which resulted in an approximate CT dose index of 10 mGy. Namely, a tube voltage of 120 kVp, a helical pitch factor of 1.0, a 0.5 s rotation time, and a tube current time product of 147 mAs. No automatic tube current modulation was used.

Typical reconstruction parameter settings for clinical protocols in thoracic and abdominal oncology were varied for the phantom study as follows: Reconstruction algorithm, iterative reconstruction (IR) or filtered back projection (FBP); reconstruction kernel, 2 standard soft tissue kernels per algorithm; slice thickness in millimeter, 1, 1.5, 2, 3; and slice spacing in millimeter, 0.75, 1, and 2. Series reconstructed with an IR algorithm used an ADMIRE (advanced modeled iterative reconstruction) at strength level 3. In total, 8 groups of parameter variations were selected for the phantom study to assess their impact on classic radiomics features. Initially, 20 repetition scans were performed without re-positioning of the phantom, followed by 10 repetitions with re-positioning between each measurement. Therefore, 30 distinct acquisitions were performed for each of 8 parameter variation groups.

In the abdominal section six 3D regions of interest (ROIs) were manually annotated by a board-certified radiologist using a thin-section phantom series with 2 mm slice thickness and 1 mm spacing. The ROIs were annotated conservatively, well within the margins, thus no cross-check step of the annotations was performed by other radiologists. A polygonal outline was used on all slices individually to define the ROIs. The six ROI binary masks were stored in a 3D volume NIfTI format. Two normal liver tissue regions, two cysts, a hemangioma, and a liver metastasis from a colon carcinoma were included during the annotation process, regions can be found in Fig. 1. Further details of the annotated regions and the 8 variation groups can be found in Jimenez-del-Toro et al.23.

Figure 1.

Figure 1

Annotated regions of interest on the anthropomorphic phantom.

A multi-center phantom CT study was also carried out with 13 different scanners at selected locations in Switzerland. The scanners used were two Siemens SOMATOM Definition Edge, two Siemens SOMATOM Definition Flash, a Siemens SOMATOM Edge Plus, a Siemens SOMATOM X.Cite, a Philips Brilliance iCT, a GE BrightSpeed S, a Philips Brilliance CT 64, a GE Revolution Evo, a GE Revolution Apex, a Canon Aquilion Prime SP and a Canon Aquilion CXL. The same protocol was implemented (as closely as possible) in all acquisitions. A tube voltage of 120 kVp, a helical pitch factor of 1.0 and a 0.5 s rotation time were used. The tube current time product was adjusted accordingly to achieve the required dose of 10 mGy. The IR reconstruction algorithm was used with slice thicknesses 2 or 2.5 mm and slice spacing 1 or 1.25 mm.

Radiomics feature extraction and principal component analysis

From both the simulated and phantom CT scans, a total of 86 radiomics features were extracted in 3D from the manually segmented ROIs using the open source Pyradiomics python toolkit29. Definitions for the radiomics features are available in the Pyradiomics documentation online (https://pyradiomics.readthedocs.io/en/latest/features.html). The 86 features extracted include 18 first-order statistics, 22 grey level co-occurrence matrices, 14 grey level dependence matrices, 16 grey level run length matrices and 16 grey level size zone matrices, as described briefly in the “Radiomics extraction description” section in “Appendix”. Radiomics features parameters were set to their default values. More specifically, no filter was applied to the input image and a fixed bin width of 25 was used for the discretisation of the image grey level. Fixed bin size discretisation is defined such that a new bin is assigned for every intensity interval within the bin width starting at the lowest occurring intensity. Additionally, no normalization, no spatial resampling, no resegmentation were performed and no HU cutoffs were used within the ROIs for the extraction. The distance between the center voxel and the neighbor, for which angles should be generated, was set to one pixel. Furthermore, for the first order radiomics the voxel array shift parameter was set to zero, for the grey level co-occurrence matrices the co-occurrences was assessed in two directions per angle, which results in a symmetrical matrix and for the grey level dependence matrices no cutoff value for dependence was set, i.e. a neighbouring voxel was always considered independent.

For the phantom CT acquisitions, an analysis was carried out via the principal component analysis (PCA). The first two principal components of the 86 radiomics features from all 240 phantom CT acquisitions are shown in Figs. 6 and 7 with black markers. The ROIs can be separated into 4 distinct tissue classes, i.e. normal liver tissue, cyst, hemangioma, and liver metastasis. The differences between the four ROI classes (inter-class variation) are larger than all CT parameter variations (intra-class variation). ROIs from the normal liver tissue class are closer in the feature space than those from the other classes. All four classes remain linearly separable despite the CT parameter variations.

Figure 6.

Figure 6

Principal component analysis. The black markers indicate the empirical phantom study data of the region of interest with the same shape as the colored markers. The colored markers indicate the equivalent result of the simulator with iterative reconstruction and the parameter range shown in Table 1. The average value and variability of the two principal components are closely matched.

Figure 7.

Figure 7

FBP principal component analysis. The black markers indicate the empirical phantom study data of the region of interest with the same shape as the colored markers. The colored markers indicate the equivalent result of the simulator with the filtered back-projection reconstruction and the parameter range shown in Table 1. The average value and variability of the two principal components are matched up to a shift of the first principal component.

Furthermore, the Wilcoxon statistic W was used to assess the stability and discriminative power of isolated radiomics features23. We set a threshold of W < 1 to indicate a stable comparison. The top 10 ranked features of the phantom CT acquisitions are shown on the right-hand of the appendix Table 3.

Table 3.

Comparison table of highest stability and discriminative power radiomics features as observed in the phantom acquisitions and simulational studies.

Stability (%)—simulator Stability (%)—empirical phantom study
Original glszm SmallAreaLowgreyLevelEmphasis Original gldm LargeDependenceHighgreyLevelEmphasis
Original gldm LargeDependenceHighgreyLevelEmphasis Original glszm SmallAreaLowgreyLevelEmphasis
Original gldm SmallDependenceLowgreyLevelEmphasis Original firstorder Median
Original firstorder Kurtosis Original glcm ClusterShade
Original glcm ClusterShade Original gldm SmallDependenceLowgreyLevelEmphasis
Original glcm InverseDifferenceMomentNormalised Original firstorder Mean
Original glszm LowgreyLevelZoneEmphasis Original glcm InverseDifferenceMomentNormalised
Original glrlm ShortRunLowgreyLevelEmphasis Original glrlm ShortRunLowgreyLevelEmphasis
Original glrlm LongRunHighgreyLevelEmphasis Original glszm LowgreyLevelZoneEmphasis
Original glcm InverseDifferenceNormalised Original glrlm LowgreyLevelRunEmphasis
Discriminative power (%)—simulator Discriminative power (%)—empirical phantom study
Original firstorder 90Percentile Original firstorder Median
Original firstorder Energy Original firstorder Mean
Original firstorder Mean Original glszm LargeAreaHighgreyLevelEmphasis
Original firstorder Median Original firstorder Energy
Original firstorder RootMeanSquared Original firstorder TotalEnergy
Original firstorder TotalEnergy Original glszm greyLevelNonUniformity
Original gldm DependenceNonUniformity Original firstorder Minimum
Original glrlm RunEntropy Original glrlm greyLevelNonUniformity
Original glszm greyLevelNonUniformity Original glszm LargeAreaLowgreyLevelEmphasis
Original glszm LargeAreaEmphasis Original glszm SizeZoneNonUniformity

CT simulator

The simulator environment was implemented to reconstruct images at different noise levels, with different reconstruction algorithms, and number of projections. Each simulation parameter set was repeated ten times for different noise random seeds to approximate repeated scans. For the simulated images, feature values were extracted and analysed. Specific features are explained in detail in the “Radiomics feature extraction and principal component analysis” section.

The ASTRA toolbox CT-scan and reconstruction simulator26 was employed for the purpose of this study. The simulator is based on simple geometric principles for the creation of projection data (sinograms). These sinograms can then be manipulated to mimic more realistic scenarios, for example through adding Poisson noise. Subsequently, the processed images are passed to the reconstruction algorithm. To match the simulator to the phantom acquisitions, a helical scanning sequence of pitch one was realised by explicitly specifying a sequence of helical projection vectors. These explicit projection vectors define the scanning frequency, i.e. the total number of projections. A conical beam is utilized and the target and detector are placed at 500 mm and 1000 mm respectively to approximate the real scanner geometry. A flat square detector of 512 by 512 of continuous pixels (1 mm) was implemented for simplicity. The number of detector pixels is higher than for a clinical CT scanner (approximately 1000 by 64) but is nevertheless compensated by an equivalent decrease in the scanning frequency, making the simulations simultaneously efficient and realistic.

Random uncorrelated noise is added at the projection level by sampling from a Poisson distribution,

f(k;λ)=λke-λk!,

where, f(k;λ) describes the probability of k occurrences and λ is both the expectation and the variance of the distribution. A background intensity I0 is used to define the noise level i.e. at each pixel of the projection images:

Isampledf(λ=I0e-Iimage),Ifinal=-log(Isampled/I0).

Iimage,Isampled,Ifinal represent the initial image, sampled and final intensities respectively. Therefore the background intensity is inversely related to the Poisson noise. Here we denote the added noise level I0-1 as A. The noise is added using the “add_noise_to_sino” function in the ASTRA toolbox.

To calibrate an appropriate noise level A the average pixel-wise variance σ2 is calculated for a range of As and compared to the σ2 of the phantom CT acquisitions, see Fig. 3. The average σ2 of the low dose (1 mGy) and high dose (10 mGy) acquisitions are plotted as the horizontal lines. An approximate linear relation is observed between σ2 and A as seen from the linear fit. The crossing points between the horizontal line limits and the fitted line serve as a guide for a realistic A parameter range. In the simulation study, noise levels close to the 10 mGy were used as this was the dose level used in the tandem phantom study.

Figure 3.

Figure 3

Average pixel-wise variance of the iterative method simulated image plotted against the arbitrary noise measure. Black and grey lines denote the average variance of the high dose and low dose acquisitions.

The reconstructions are performed with the simultaneous iterative “SIRT” and filtered back-projection “FBP” 3D algorithms as implemented in the ASTRA toolbox. Specifically the “SIRT3DCUDA” with 500 iterations and “FDKCUDA” were used, the reconstruction kernels are fixed by the simulator and the slice thickness is the same as the pixel resolution, i.e. 1 mm. Furthermore, a distinct numerical random seed is used for the Poisson noise, to imitate repetitions as performed for the phantom CT acquisitions23. The method is very efficient numerically, as total computational time on a modern GPU is in the order of minutes per complete reconstruction.

Results and discussion

Pilot simulations are carried out with the optimal set of parameters as seen in Table 1, i.e. minimum noise and maximum number of projections for ten repetitions. First, the procedure is verified qualitatively by visual inspection of the reconstructions with optimal parameter choice, axial snapshots can be seen in Fig. 2. Both reconstruction methods are sufficiently successful. The iterative reconstruction has low noise and no artifacts are visible. The FBP method is marginally noisier and exhibits some minor artifacts, these differences are expected as theoretically the iterative method is superior, albeit more computationally expensive.

Table 1.

Parameter choice for the simulation environment.

Parameter Range Optimal
Noise level (σ2/pixel in HU2) 2.5 ×10-3–2.8 ×10-3 2.5 ×10-3
Number of projections 150–450 450

Figure 2.

Figure 2

Axial views of anthropomorphic radio-opaque phantom. Left, original input. Middle, filtered back-projection reconstruction, right, iterative reconstruction, both obtained by the CT simulator.

Additionally, the Wilcoxon statistic W is employed to analyze the stability and discriminative power of the radiomics features as extracted from the simulated CTs. To this end, a study is carried out to mimic the phantom CT acquisitions. Namely the simulations are separated into 8 distinct groups with different projection number and reconstruction algorithms, see Table 2. Within each group, repetitions are achieved via a different Poisson noise random seed. Across the study, the same noise level was added at the projection stage. The ROIs are separated into 4 distinct tissue classes, i.e. normal liver tissue, cyst, hemangioma, and liver metastasis. The analysis aims to quantify stability and discriminative power of features across parameter groups using the class definitions.

Table 2.

Parameter choice for the stability and discriminative power study.

Group Reconstruction Projections
1 SIRT 150
2 SIRT 200
3 SIRT 250
4 SIRT 300
5 FBP 150
6 FBP 200
7 FBP 250
8 FBP 300

For all simulations, the noise level was set to A=0.0001, i.e. equivalent to approximately 10 mGy dose, and 10 different random seeds were used to achieve repetitions within the group.

The result is depicted in Fig. 4. The stability (intra-class variation) percentage is calculated from a pairwise comparison among the 8 parameter variation groups. This process is repeated for all available tissue classes, while all other CT parameters are kept constant. Expressly, for each feature from each class, W is calculated in-between the groups. To this end, a threshold of correlation is predetermined for W at 1. I.e. the repetitions within the two tested groups in question follow the same distribution if W<1 and the pairwise comparison is considered successful. The percentage is calculated as the total fraction of the successful pairwise comparisons for each feature. The discriminative power (inter-class variation) is calculated via pairwise comparison in-between tissue classes for each feature and group. In contrast to stability, here a successful comparison is achieved if W>1. Again the percentage represents the fraction of successful comparisons.

Figure 4.

Figure 4

Percentage stability of features as intra-class comparison and discriminative power inter-class comparison.

The results show that although the majority of radiomics features had low stability for CT parameter variations, as has been previously shown in other studies3033, the discriminative power is high in the task of differentiating in-between the tissue classes. This relation is again observed in the phantom study that is mimicked23. The top ten features across each axis selected by the simulation environment, i.e., virtual phantom, and the phantom CT acquisitions are compared in Table 3. To demonstrate the ability of the simulation environment to predict stable features, an overlap of the best scoring features relative the phantom CT acquisitions is plotted in Fig. 5. The x-axis represents an ascending percentage of features that are considered as the highest scoring group (e.g. 10% = top 9 out of 86 features) and the y-axis the percentage within that group that overlaps with the top features seen in the phantom study, http://links.lww.com/RLI/A632. For both stability and discriminate power, the overlap is consistently high, i.e., not in a linear relationship as expected for non-correlated lists.

Figure 5.

Figure 5

Overlap of highest scoring features between simulation and phantom CT acquisitions plotted against ascending percentage that are considered highest scoring. Plotted for the stability and discriminative power alike. The grey area represents un- or negatively-correlated overlap between the two methods.

Furthermore, the radiomics features of the simulator are compared to the empirical phantom acquisitions in Fig. 6 in a variability analysis. To this end, the principal components are calculated to investigate the similarity and variability of the radiomics, and we use the parameter range as shown in Table 1. As seen from Fig. 6, the simulation radiomics variability is in agreement with the empirical results. It should be noted here that the study was carried out in a semi-blind methodology, i.e. after matching all the possible parameters to reality, the best possible values were used to create the optimal reconstruction. Afterwards, an appropriate noise level was chosen using Fig. 3 for the purpose of this variability study.

The filtered back-projection method creates an inferior reconstruction as seen in Figs. 2 and 7. There is a larger discrepancy between empirical distributions and distributions obtained through the simulation. Nevertheless, when the PCs are plotted the results indicate that the variance is well within the experimental result. There is a lateral shifting of the first PC. The variability is well captured by the simulator for all six ROIs.

Furthermore, the radiomics features of the simulator are compared to the multi-center empirical phantom acquisitions in Fig. 8 with a PCA variability analysis. In the simulator the projection number is fixed to 200 and 250 and the noise range extended to 2.5 ×10-3–2.9 ×10-3 (σ2/pixel in HU2). This parameter range mimics the fixed slice reconstruction thickness and the extended noise range was used to realise the unknown differences inherent in a multi-center study. As seen from Fig. 8, the simulation radiomics variability is in agreement with the empirical results.

Figure 8.

Figure 8

Multi-center principal component analysis. The black markers indicate the multi-center empirical phantom study data of the region of interest with the same shape as the colored markers. The colored markers indicate the equivalent result of the simulator with iterative reconstruction.

Discussion

Experimental comparison showed striking similarity between sensitivity analyses carried out with the anthropomorphic phantom and the CT simulator. Despite the approximations, the CT simulator was able to generate images with very similar characteristics, as quantified by the features studied here, to real images of the phantom. This is essentially a model whose parameters can be changed to match those observed in phantom studies. This similarity opens different avenues for further investigation and practical opportunities. For example, studies with multi-centre, multi-vendor data sets34 can be effortlessly scaled up and automated.

First, the results suggest that sensitivity analysis for new features or new ways to extract features can be initially performed with a CT simulator. This would substantially reduce the efforts and costs required to study generalization properties of new radiomics features, radiomics analyses and image-based learning techniques to new acquisition settings. This is crucial since this generalization ends up being one of the most notorious obstacles in front of translating new quantitative image analysis technologies to clinical practice.

While hand-crafted features’ stability can in theory still be quantified with phantom studies, this approach remains very limited when it comes to assessing stability of advanced algorithms that extract features in a data-driven way, e.g., neural networks. Phantom studies yield very limited number of images and this inhibits using them for assessing stability of neural network-based feature extraction methods. The simulation study we showed here is a direct solution to this issue. The approach can use any CT image as a “phantom”, therefore yields a large number of images to perform accurate stability analysis of such advanced algorithms.

Second, training of learning-based methods can be modified to encourage robustness to variations of imaging characteristics during training. For instance, through extensive data augmentation one can gain robustness to variations in Magnetic Resonance Imaging (MRI) acquisitions of the same contrast35. As the CT simulator can generate images realistic enough to yield similar sensitivity analyses as an empirical phantom study, one can imagine using such simulators for training of highly robust deep learning models.

The CT simulator we used here did not consider various details of the acquisition due to simplifications of the system’s physical model. Our experiments with more complete models, such as Geant436, showed that using such models is challenging due to the difficulty in replicating a given scan and computation time. Making more accurate simulators more user friendly and faster may improve the quality of the sensitivity analyses. In addition, a possible extension of this work can be the application of an automatic segmentation method. Allowing for automated and accurate determination of ROIs, especially useful for the segmentation of liver tissue regions.

Conclusions

Based on the astratoolbox we have created an environment to reproduce artificial variability on an initial CT-image. The environment was verified to replicate the diversification observed from empirical acquisitions via a principal component analysis, both for intra- and inter-scanner analyses. The methodology and simulational tool can accelerate the creation and testing of stable and discriminative radiomics features. More crucially, this tool can generate realistically variable CT-image datasets for training highly robust deep learning models.

Appendix

Best performing radiomics comparison

See Table 3.

Radiomics extraction description

The feature definitions are described in the Imaging Biomarker Standardization Initiative29. More specifically, the radiomics feature categories used are, https://pyradiomics.readthedocs.io/en/latest/index.html:

  • 18 first-order statistics which describe the distribution of voxel intensities within a region through common metrics. For example,
    energy=i=1Np(X(i))2, 1
    which is a measure of the magnitude (sum of the squares) of voxel values in an image. X is a set of Np voxels included in the ROI.
  • 22 grey level co-occurrence matrices (GLCM). A GLCM of size Ng×Ng describes the second-order joint probability function of an image region constrained by the mask and is defined as P(i,j|δ,θ). The (i,j)th element of this matrix represents the number of times the combination of levels i and j occur in two pixels in the image, that are separated by a distance of δ pixels along angle θ. The distance δ from the center voxel is defined as the distance according to the infinity norm. In this work we computed symmetrical GLCM. For example,
    Autocorrelation=i=1Ngj=1Ngp(i,j)ij, 2
    which is a measure of the magnitude of the fineness and coarseness of texture. P(i,j) is the cooccurence matrix for an arbitrary δ and θ. p(ij) is the normalized cooccurence matrix and equal to P(i,j)/P(i,j). Ng is the number of discrete intensity levels in the image.
  • 14 Grey level dependence matrices (GLDM). A GLDM quantifies grey level dependencies in an image. A grey level dependency is defined as the number of connected voxels within distance δ that are dependent on the center voxel. A neighbouring voxel with grey level j is considered dependent on center voxel with grey level i if |i-j|α. In a grey level dependence matrix P(i,j) the (ij) th element describes the number of times a voxel with grey level i with j dependent voxels in its neighbourhood appears in image. For example,
    Grey levelvariance=i=1Ngj=1Ndp(i,j)(i-μ)2, whereμ=i=1Ngj=1Ndip(i,j), 3

    measures the variance in grey level in the image. Ng is the number of discrete intensity values in the image. Nd is the number of discrete dependency sizes in the image. Nz is the number of dependency zones in the image. P(i,j) is the dependence matrix. p(ij) is the normalized dependence matrix, defined as p(i,j)=P(i,j)/Nz.

  • 16 grey level run length matrices (GLRLM). A GLRLM quantifies grey level runs, which are defined as the length in number of pixels, of consecutive pixels that have the same grey level value. In a grey level run length matrix P(i,j|θ), the (i,j)th element describes the number of runs with grey level i and length j occur in the image (ROI) along angle θ. The value of a feature is calculated on the GLRLM for each angle separately, after which the mean of these values is returned. For example, grey Level Variance as above.

  • 16 grey level size zone matrices (GLSZM). A GLSZM quantifies grey level zones in an image. A grey level zone is defined as a the number of connected voxels that share the same grey level intensity. A voxel is considered connected if the distance is 1 according to the infinity norm (26-connected region in a 3D). In a grey level size zone matrix P(ij) the (i,j)th element equals the number of zones with grey level i and size j appear in image. Contrary to GLCM and GLRLM, the GLSZM is rotation independent, with only one matrix calculated for all directions in the ROI. For example, grey Level Variance as above.

Author contributions

K.F. and E.K. wrote the main manuscript text and K.F. prepared Figs. 1, 2, 34, 5 and 8 and Tables, K.F. and O.J. prepared Figs. 6 and 7. K.F. developed the simulator environment and analysis codes. O.J. and E.K. contributed in the analysis. O.J., C.A., M.B., R.S., M.M.O., B.S., H.M., A.D. and E.K. reviewed the manuscript.

Funding

This project was supported by the grant #2018-531 of the Strategic Focal Area "Personalized Health and Related Technologies (PHRT)" of the ETH Domain and by the Swiss Personalised Health Network with the QA4IQI Quality assessment for interoperable quantitative computed tomography imaging project DMS2445.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Kim T-Y, Son J, Kim K-G. The recent progress in quantitative medical image analysis for computer aided diagnosis systems. Healthcare Inform. Res. 2011;17:143–149. doi: 10.4258/hir.2011.17.3.143. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Gillies RJ, Kinahan PE, Hricak H. Radiomics: Images are more than pictures, they are data. Radiology. 2016;278:563–577. doi: 10.1148/radiol.2015151169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Zerunian M, et al. Ct based radiomic approach on first line pembrolizumab in lung cancer. Sci. Rep. 2021;11:6633. doi: 10.1038/s41598-021-86113-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Fu Z, et al. CT features of COVID-19 patients with two consecutive negative RT-PCR tests after treatment. Sci. Rep. 2020;10:11548. doi: 10.1038/s41598-020-68509-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Liu Z, et al. The applications of radiomics in precision diagnosis and treatment of oncology: Opportunities and challenges. Theranostics. 2019;9:1303–1322. doi: 10.7150/thno.30309. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Court LE, Rao A, Krishnan S. Radiomics in cancer diagnosis, cancer staging, and prediction of response to treatment. Transl. Cancer Res. 2016;5:337. doi: 10.21037/tcr.2016.07.14. [DOI] [Google Scholar]
  • 7.Kononenko I. Machine learning for medical diagnosis: history, state of the art and perspective. Artif. Intell. Med. 2001;23:89–109. doi: 10.1016/S0933-3657(01)00077-X. [DOI] [PubMed] [Google Scholar]
  • 8.Bermejo-Peláez D, Ash SY, Washko GR, Estépar RSJ, Ledesma-Carbayo MJ. Classification of interstitial lung abnormality patterns with an ensemble of deep convolutional neural networks. Sci. Rep. 2020;10:1–15. doi: 10.1038/s41598-019-56989-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Hu Q, Whitney HM, Giger ML. A deep learning methodology for improved breast cancer diagnosis using multiparametric MRI. Sci. Rep. 2020;10:1–11. doi: 10.1038/s41598-020-67441-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Kleppe A, et al. Designing deep learning studies in cancer diagnostics. Nat. Rev. Cancer. 2021;21:199–211. doi: 10.1038/s41568-020-00327-9. [DOI] [PubMed] [Google Scholar]
  • 11.Yip SS, Aerts HJ. Applications and limitations of radiomics. Phys. Med. Biol. 2016;61:R150–R166. doi: 10.1088/0031-9155/61/13/R150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Prayer F, et al. Variability of computed tomography radiomics features of fibrosing interstitial lung disease: A test–retest study. Methods. 2021;188:98–104. doi: 10.1016/j.ymeth.2020.08.007. [DOI] [PubMed] [Google Scholar]
  • 13.Bae Y-K, Lee J-W, Hong S. Effects of image distortion and hounsfield unit variations on radiation treatment plans: An extended field-of-view reconstruction in a large bore CT scanner. Sci. Rep. 2020;10:1–8. doi: 10.1038/s41598-020-57422-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Mackin D, et al. Measuring computed tomography scanner variability of radiomics features. Investig. Radiol. 2015;50:757–765. doi: 10.1097/rli.0000000000000180. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Shafiq-Ul-Hassan M, et al. Intrinsic dependencies of CT radiomic features on voxel size and number of gray levels. Med. Phys. 2017;44:1050–1062. doi: 10.1002/mp.12123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Schmidt RM, et al. Assessment of CT to CBCT contour mapping for radiomic feature analysis in prostate cancer. Sci. Rep. 2021;11:22737. doi: 10.1038/s41598-021-02154-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Balagurunathan Y, et al. Reproducibility and prognosis of quantitative features extracted from CT images. Transl. Oncol. 2014;7:72–87. doi: 10.1593/tlo.13844. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Delgadillo R, et al. Repeatability of CBCT radiomic features and their correlation with CT radiomic features for prostate cancer. Med. Phys. 2021;48:2386–2399. doi: 10.1002/mp.14787. [DOI] [PubMed] [Google Scholar]
  • 19.Fedorov A, et al. An annotated test–retest collection of prostate multiparametric MRI. Sci. Data. 2018;5:180281. doi: 10.1038/sdata.2018.281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Zhang F, et al. Design and fabrication of a personalized anthropomorphic phantom using 3d printing and tissue equivalent materials. Quant. Imaging Med. Surg. 2018;9:94. doi: 10.21037/qims.2018.08.01. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Gear JI, et al. Abdo-man: A 3d-printed anthropomorphic phantom for validating quantitative SIRT. EJNMMI Phys. 2016;3:1–16. doi: 10.1186/s40658-016-0151-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Irnstorfer N, Unger E, Hojreh A, Homolka P. An anthropomorphic phantom representing a prematurely born neonate for digital X-ray imaging using 3d printing: Proof of concept and comparison of image quality from different systems. Sci. Rep. 2019;9:1–12. doi: 10.1038/s41598-019-50925-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Jimenez-Del-Toro O, et al. The discriminative power and stability of radiomics features with computed tomography variations: Task-based analysis in an anthropomorphic 3D-printed CT phantom. Investig. Radiol. 2021;56:820. doi: 10.1097/RLI.0000000000000795. [DOI] [PubMed] [Google Scholar]
  • 24.Cheng CP, Halchenko YO. A new virtue of phantom MRI data: Explaining variance in human participant data. F1000Research. 2020;9:1131. doi: 10.12688/f1000research.24544.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Jahnke P, et al. A radiopaque 3D printed, anthropomorphic phantom for simulation of CT-guided procedures. Eur. Radiol. 2018;28:4818–4823. doi: 10.1007/s00330-018-5481-4. [DOI] [PubMed] [Google Scholar]
  • 26.van Aarle W, et al. The ASTRA toolbox: A platform for advanced algorithm development in electron tomography. Ultramicroscopy. 2015;157:35–47. doi: 10.1016/j.ultramic.2015.05.002. [DOI] [PubMed] [Google Scholar]
  • 27.van Aarle W, et al. Fast and flexible X-ray tomography using the astra toolbox. Opt. Express. 2016;24:25129–25147. doi: 10.1364/OE.24.025129. [DOI] [PubMed] [Google Scholar]
  • 28.Jahnke P, et al. Radiopaque three-dimensional printing: A method to create realistic CT phantoms. Radiology. 2017;282:569–575. doi: 10.1148/radiol.2016152710. [DOI] [PubMed] [Google Scholar]
  • 29.Zwanenburg A, et al. The image biomarker standardization initiative: Standardized quantitative radiomics for high-throughput image-based phenotyping. Radiology. 2020;295:328–338. doi: 10.1148/radiol.2020191145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Berenguer R, et al. Radiomics of CT features may be nonreproducible and redundant: Influence of CT acquisition parameters. Radiology. 2018;288:407–415. doi: 10.1148/radiol.2018172361. [DOI] [PubMed] [Google Scholar]
  • 31.Traverso A, Wee L, Dekker A, Gillies R. Repeatability and reproducibility of radiomic features: A systematic review. Int. J. Radiat. Oncol. Biol. Phys. 2018;102:1143–1158. doi: 10.1016/j.ijrobp.2018.05.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Meyer M, et al. Reproducibility of CT radiomic features within the same patient: Influence of radiation dose and CT reconstruction settings. Radiology. 2019;293:583–591. doi: 10.1148/radiol.2019190928. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.van Timmeren JE, et al. Test–retest data for radiomics feature stability analysis: Generalizable or study-specific? Tomography. 2016;2:361–365. doi: 10.18383/j.tom.2016.00208. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Perkonigg M, et al. Dynamic memory to alleviate catastrophic forgetting in continual learning with medical imaging. Nat. Commun. 2021;12:5678. doi: 10.1038/s41467-021-25858-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Zhang L, et al. Generalizing deep learning for medical image segmentation to unseen domains via deep stacked transformation. IEEE Trans. Med. Imaging. 2020;39:2531–2540. doi: 10.1109/TMI.2020.2973595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Agostinelli S, et al. Geant4-a simulation toolkit. Nucl. Instrum. Methods Phys. Res. Sect. A. 2003;506:250–303. doi: 10.1016/S0168-9002(03)01368-8. [DOI] [Google Scholar]

Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES