Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Sep 15.
Published in final edited form as: Phys Med Biol. 2021 Apr 6;66(7):074004. doi: 10.1088/1361-6560/abeea5

Standardization of histogram- and gray-level co-occurrence matrices-based radiomics in the presence of blur and noise

Grace J Gang 1, Radhika Deshpande 1, J Webster Stayman 1
PMCID: PMC8607458  NIHMSID: NIHMS1755424  PMID: 33822750

Abstract

Radiomics have been extensively investigated as quantitative biomarkers that can enhance the utility of imaging studies and aid the clinical decision making process. A major challenge to the clinical translation of radiomics is their variability as a result of different imaging and reconstruction protocols. In this work, we present a novel radiomics standardization framework capable of modeling and recovering the underlying radiomic feature in images that have been corrupted by the effects of spatial resolution and noise. We focus on two classes of radiomics based on pixel value distributions—i.e. histograms and gray-level co-occurrence matrices (GLCMs). We developed a model that predicts these distributions in the presence of system blur and noise, and used that model to invert these physical effects and recover the underlying distributions. Specifically, the effect of blur on histogram and GLCM is highly image-dependent, while additive noise convolves the histogram/GLCM of the noiseless image with those of the noise. The recovery method therefore consists of two deconvolution operations: the first in the image domain to remove the effect of system blur, the second in the histogram/GLCM domain to remove the effect of noise. The performance of the proposed recovery strategy was investigated using a set of texture phantoms and an emulated computed tomography imaging chain with a range of realistic blur and noise levels. The proposed method was able to obtain histogram and GLCM estimates that closely resemble the ground truth. The method performed well across imaging conditions and significantly lowered the variability associated with different imaging protocols. This improvement also translated to better classification accuracy, where recovered radiomic values result in greater separation of radiomic clusters for two different texture phantoms as compared to values derived from the original blurred and noisy images. In summary, the novel radiomics standardization framework demonstrates high potential for mitigating radiomic variability as a result of the imaging system and can potentially be integrated as a preprocessing step towards more robust and reproducible radiomic models.

Keywords: computer aided diagnosis, image biomarker estimation, radiomics recovery, radiomics harmonization, cascaded systems analysis

1. Introduction

Radiomics, or image biomarkers, can significantly boost the utility of imaging studies by using quantitative image features to infer underlying physiological processes, diagnose and characterize pathologies, and aid in clinical decision making (Aerts et al 2014, Lambin et al 2017, Avanzo et al 2017). The increasing availability of large-scale image databases and the widespread investigation of machine learning methods in recent years have provided a particularly conducive environment for radiomics research and applications. Radiomics have been extensively investigated in oncology and are finding applications in other diseases and in a wide range of organ systems (Nie et al 2016, Isensee et al 2017, Kolossváry et al 2018).

Despite the potential of radiomics, a major challenge to clinical translation is the reproducibility and repeatability of radiomic models. A major contributor to such concern is the variability in radiomics values as a result of different image acquisition and reconstruction protocols (Kumar et al 2012, Rizzo et al 2018). Radiomics are computed directly from image data, and are hence affected by the underlying image properties. Such image properties can be quantified in terms of the spatial resolution (including bias) and noise, and are, in turn, dependent on the imaging chain—scanner specifications, acquisition techniques, reconstruction parameters, and even the particular patient anatomy. Ideally, radiomic data should capture the variability in the actual signal, e.g. the patient-specific variability in phenotypical expression of the underlying biological signature. This ‘desirable’ variability may be revealed through efficient feature discovery in large populations where patients can be divided into ‘clusters’ to direct treatments, etc. In contrast, variability of the same radiomic feature as a result of data formation and processing is ‘undesirable’ and may obscure the signatures we aim to capture. Moreover, models trained on data from one set of imaging conditions may not apply for data acquired from another set of conditions (Zhao et al 2016, Kim et al 2019).

Such image-chain-based variability in radiomics is common among all imaging modalities, and is especially problematic for multi-center databases with institution—and/or radiologist-specific scanners and protocols. In x-ray computed tomography (CT), variability in image properties is well-known—e.g. noise increases with larger patient habitus, contrast is dependent on tube voltage, etc. Furthermore, both spatial resolution and noise (including both magnitude and correlation) are potentially space-variant, even in conventional linear processing methods (filtered-backprojection, FBP) (Gang et al 2014). The recent surge in advanced nonlinear data processing (e.g. model-based estimation and deep learning) introduces additional data dependencies and exhibits image properties that are distinct from conventional processing methods (Gang et al 2014, Solomon et al 2015).

To address this challenge, current research effort falls into three categories. Harmonizing acquisition and processing protocols (Kumar et al 2012, Ger et al 2018) is an obvious solution but is challenging to realize in clinical settings due to the large diversity of imaging hardware and software. In addition, even a standard imaging protocol would not resolve image quality variability within the patient anatomy. Alternatively, some studies focus on identifying radiomic features that are robust across different imaging conditions so that ‘unstable’ features can be excluded from radiomic models (Larue et al 2017b). This approach reduces variability but may also reduce the set of useful features. A third category of solutions focuses on ways to normalize/standardize radiomic values from different imaging conditions to a common baseline. Standardization can happen directly in the radiomic feature domain or in the image domain prior to radiomic feature calculation. The former typically involves empirically fitted relationships for individual features based on certain image properties, e.g. signal-to-noise ratio from a calibration phantom (Zhovannik et al 2019), peak frequencies and maximum intensities of the noise power spectrum (NPS) (Shafiq-ul Hassan et al 2017a). The latter includes the application of spatial filters (Mackin et al 2019) or deep learning techniques (Choe et al 2019) to match images reconstructed with different kernels, image denoising (Kim et al 2010), as well as image resampling to account for voxel size differences (Shafiq-ul Hassan et al 2017b, Mackin et al 2017).

In this work, we propose a novel approach for radiomics standardization. While current solutions tend to focus on particular settings on specific scanners (e.g. reconstruction kernel, exposure, voxel size), none has presented a systematic treatment of both spatial resolution and noise. Furthermore, radiomics computations can be treated as an additional step in the imaging chain and stand to benefit from the same type of theoretical modeling that has guided CT system design for decades. In this work, we focus on radiomics that treat the distribution of pixel values like a stochastic process with characterizations based on histograms and gray-level co-occurrence matrices (GLCMs). We present models for predicting histograms and GLCMs in the presence of image blur and noise, and then develop strategies for estimation of the true underlying histograms and GLCMs. Performance is evaluated in a phantom study with a realistic range of blur and noise conditions using a cascaded systems analysis model of CT image properties. Preliminary studies using the proposed methodology were presented for GLCM in Gang et al (2021). This work contains a more thorough theoretical development and an extended evaluation of the proposed techniques.

2. Theoretical methods

In this section, we aim to establish theoretical methods for predicting and recovering pixel value distributions as represented by histograms and GLCMs. The prediction aspect extends a rich body of work in system and image quality modeling (Tward and Siewerdsen 2008, Gang et al 2012) and treats radiomics computation as an additional step in the imaging chain. Given ground truth images/radiomics and a model of the physical and mathematical processes at every stage of data acquisition and processing, one may predict radiomics values for different imaging conditions. The prediction model also permits inversion of the process to recover the ground truth radiomics given the original image data and mathematical characterizations of the image properties (e.g. from either theoretical predictions or empirical measurements). Figure 1 illustrates a revised radiomic workflow where the proposed recovery method replaces the typical radiomic computation step acting directly on the reconstructed image. Instead, the method proposed herein requires as inputs the reconstructed image and its associated noise and resolution properties. Recovered radiomic values can then serve as inputs to a radiomic model. The proposed methods may also be combined with additional standardization methods such as ComBat (Orlhac et al 2018).

Figure 1.

Figure 1.

A revised radiomic workflow with the proposed recovery method.

2.1. Prediction—effect of blur and noise on pixel value distributions

If one considers the pixel values in an image (or image patch) to be samples of an underlying stochastic process represented by a probability density function (PDF), one can consider histograms and GLCMs to be discretized forms of that underlying distribution. Such an object model is convenient for describing how these distributions are propagated through the imaging chain. We focus our investigation on systems that are linear and shift-invariant. Such an assumption also applies for local regions-of-interest in an FBP reconstruction or a locally linearizable model-based reconstruction (Gang et al 2014), where noise and resolution properties are locally stationary and shift-invariant. Thus, we presume the image data from such systems, μ^, is related to the ground truth image, μ, by the following model:

μ^=b1*μ+b2*n, (1)

where n is an additive noise source, b1 is the system blur, and b2 is a filter that permits modeling of (potentially) correlated noise imparted by the imaging system. In the general case, b1 and b2 may be different. For example, in CT, some sources of blur imparts noise correlations (e.g., light spread in the scintillator), while some do not (e.g., focal spot blur). The above model can be considered a distillation of the entire imaging chain into the overall noise and resolution effects imparted to the true underlying image. In practice, b1 and b2 can either be empirically measured from the imaging system, or predicted by image quality models. For CT systems employing FBP reconstructions, b1 can be determined for each reconstruction kernel, and b2 can be estimated for each local ROI using b1 and an anatomical model based on the reconstruction.

The effect of blur on radiomics can be highly dependent on image content since pixel values in a blurred image are dependent on their neighbors. As a simple example, a random shuffling of all image pixels does not change the ground truth histogram, but would change the histogram of the blurred image. Thus, histogram recovery requires some knowledge of the spatial distribution of pixel values, and modeling the effect of blur requires the ground truth image as input.

The effect of noise, on the other hand, does not necessarily require knowledge of the image. If the noise is independent of pixel values, only the ‘PDF’ associated with the noisy image is required. Additive noise will broaden the underlying ground truth histogram or GLCM matrix through a convolution with the noise distribution (Hogg et al 2005):

Hμ^=Hμ*Hn
Gμ^=Gμ*Gn (2)

where H and G denote the histogram and GLCM, respectively. For any noise realization with a sufficient number of pixels/samples, the histogram is simply a discretized version of its PDF. Assuming n is white Gaussian noise with standard deviation σ, i.e. n ~ G(0, σ2), the histogram of correlated noise, b2 * n, is also Gaussian, with variance scaled by the sum of squares of every element in b2:

Hb2*n=G(0,j,kb2(j,k)2×σ2). (3)

Similarly, the GLCM of noise is the discretized joint probability distribution between two pixels of a certain spatial offset (denoted as [a, b] in 2D). Under the same assumption of white Gaussian noise, the GLCM of b2 * n can be modeled as a bivariate Gaussian distribution:

Gn[a,b]=12πσXσY(1ρ2)×exp{12(1ρ2)[(xμX)2σX22ρ(xμX)(yμY)σXσY+(xμY)2σY2]}, (4)

where the correlation coefficient ρ is given by the value of the autocorrelation of b2 (denoted as Rb2b2 at the corresponding spatial offset:

ρ=Rb2b2(a,b). (5)

In summary, the forward model for predicting histogram and GLCM based on the ground truth image:

Hμ^=Hb1*μ*Hb2*n
Gμ^=Gb1*μ*Gb2*n. (6)

The effect of blur and noise on the GLCM of an example image (Weber 1997) is illustrated in figure 2. For purposes of illustration, we present an example where both b1 and b2 are Gaussian blurs with σ = 3 pixel; n is white Gaussian noise with σ2 = 0.1. The example GLCM is computed with a spatial offset of [0, 3]. In this case, blur narrows the GLCM, while addition of noise broadens the distribution.

Figure 2.

Figure 2.

The prediction forward model and recovery method shown for an example image and a particular GLCM spatial offset ([0, 3]). The effect of blur on GLCM is highly image-dependent, while additive noise broadens the GLCM through convolution with GLCM of the noise itself. The recovery method involves two deconvolution step—one in the image domain to remove the effect of blur, the second in the GLCM domain to remove the effect of noise.

2.2. Recovery based on known noise and resolution properties

Based on the forward model, we would like to develop methods to recover the underlying distributions associated with the ground-truth image. While some approaches might try to first estimate the underlying true image, this is difficult since it would involve both deconvolution of image blur as well as denoising. These are generally competing operations (e.g. deconvolution tends to increase noise, and denoising tends to apply some kind of blur), making simultaneous deblurring and denoising a very challenging operation.

In this work, rather than trying to solve the difficult problem of finding the underlying true image, we instead focus on finding the underlying distribution. The above forward model suggests an approach for direct estimation of the distribution without computing a true image estimate. Specifically, the ground truth histogram or GLCM can be recovered via two deconvolution operations if the blur and noise distribution in the image are known a priori. Steps in this process are illustrated in right half of figure 2 and are discussed in the following paragraphs.

First, we deconvolve the blur from the blurred and noisy image. Blur removal must necessarily be performed first in the image domain prior to noise removal—if we perform noise removal in the GLCM/histogram domain first, the resulting distributions no longer contain the required spatial information for deconvolution of system blur. This deconvolution can be performed using any number of standard approaches; however, one should choose an approach where the noise properties of the deblurred image may be predicted given the noise distribution in original image. We opt for a direct Fourier domain deconvolution method. Ideally, this inversion is unbiased; however, to avoid undue noise amplification, we add a frequency domain squared sine regularizer, P (f), that discourages noise in higher frequency components. Mathematically, this step is given by:

μd=F1{F{μ^}F{b1}+βP(f)}, where P(f)={sin2(πfa) if f<=fNyq ϵ if f>fNyq . (7)

Here, μd is the deblurred image, a is the voxel size, f denotes the frequency axis, fNyq is the Nyquist frequency, and ϵ is a small number (set to 10−13 in this work) to avoid division errors in implementation. The parameter β controls the strength of the regularizer and is tuned on a case-by-case basis detailed in section 3.4.

After blur deconvolution, the noise in the processed image, nd, may be written as:

nd=b11*b2*n. (8)

Since both b1 and b2 are known, the histogram and GLCM of nd can be analytically computed according to equations (3) and (4). The effect of noise can thus then be removed via a deconvolution in the histogram or GLCM domains, respectively:

Hμ=Hb11*b2*n1*Hb11*μ^
Gμ=Gb11*b2*n1*Gb11*μ^. (9)

Unlike the first deconvolution, this second deconvolution operation can be performed using any methods (i.e., there is no requirement of predictability for noise propagation). For this work, we used the iterative Richardson–Lucy deconvolution method.

In summary, the recovery process consists of two sequential deconvoultion operations: first, deconvolution in the image domain to remove the effect of blur; second, in the histogram/GLCM domain to remove the effect of noise. The recovery process is illustrated in the right half of figure 2. In this particular case, the regularization strength β is 0.0003. Noise amplification can be observed in the deblurred image. The GLCM domain deconvolution is able to recover the distribution of the ground truth image fairly well. A systematic evaluation of this recovery procedure is detailed in the following section.

3. Experimental methods

3.1. Phantom and ground truth texture information

For evaluation of the prediction and recovery framework, we used 3D-printed texture phantoms developed in previous work (Shi et al 1919). Briefly, we developed 3D models for phantoms using a procedural texture generation method wherein spherical voids are packed within a volume. Void placement is semi-random where a predefined amount of overlap with at least one existing void is guaranteed such that the void space is interconnected. Based on the size and overlap of the spherical voids, different textures can be generated. The 3D models were printed on a commercial stereolithography (SLA) printer (Peopoly Moai, Hong Kong) with a 70 μm laser spot and 25 μm layer height. The phantoms were scanned on a microCT scanner (Bruker SkyScan 1172, Billerica, MA) at 50 kVp and 200 μA, and reconstructed using the FBP algorithm at an isotropic voxel size of 27 μm. The original reconstructions provided by the scanner were discretized to 128 gray levels. We first scaled all voxel values to the range of linear attenuation coefficients from 0 to 0.02 mm−1. Next, to simulate a continuous distribution of voxel values, we inserted a low noise floor using white Gaussian noise with the standard deviation equal to 2% of the maximum pixel value. These images and radiomics computed thereof were then treated as the ground truth. In this work, we used three textures generated using the above method as shown in figure 3. Respectively, the void sizes for the textures are (1) uniformly distributed between 1 and 2 mm (labeled as heterogeneous), (2) 1.4 mm, and (3) 1.6 mm.

Figure 3.

Figure 3.

Texture phantoms generated by procedurally removing voids of various sizes from a cylindrical phantom. A photo of the 3D printed physical phantom is shown along with microCT scans of all three phantoms exhibiting different textures. The microCT reconstructions and radiomics computed thereof are treated as the ground truth.

3.2. Simulation of realistic CT blur and noise

To exercise the recovery method on images with realistic CT image properties, we emulated different levels of blur (b1), noise correlation (b2), and noise magnitude from a CT cascaded system model established in previous work (Tward and Siewerdsen 2008, Gang et al 2014). Such models describe signal and noise propagation from the x-ray source through the patient, detector, and reconstruction to yield the spatial resolution (in terms of the modulation transfer function, MTF) and and noise (in terms of the noise power spectrum, NPS) in the reconstructed image. This type of model has been extensively validated on many CT systems under various imaging conditions (Gang et al 2011, Zhao et al 2014). As an example system, we simulated an 90 kV x-ray source spectrum using Spektr (Siewerdsen et al 2004) with 1.6 mm Al as the intrinsic filtration and an additional 2 mm Al and 0.2 mm Cu as the extrinsic filtration. The detector contains a 600 μm CsI scintillator and has a pixel pitch of 0.194 mm. Reconstruction follows the FBP algorithm using Hann apodization filter with an adjustable cutoff parameter, c0, that controls the maximum frequency content of the filter and hence the spatial resolution and noise correlation in the reconstruction:

THann ={h+(1h)cos(2πaufuc0) if fu<=c0fuNyq0 if fu>c0fuNyq, (10)

where h is 0.5, au is the pixel size, and fu and fuNyq are the corresponding frequency axis and Nyquist frequency, respectively. Reconstruction voxel size is chosen to be the same as the native microCT voxel size, 27 μm.

We simulated different levels of system blur and noise correlation by varying c0 from 0.1 to 1.0 with 0.1 increments. We further simulated different noise magnitudes by varying the tube current and pulse width product, mAs over two orders of magnitude, from 1 to 0.01 mAs. At 1 mAs, the imaging techniques results in a 1.1 mR barebeam exposure. The noise level in the reconstruction is dependent on both c0 and mAs, with higher cutoff frequencies and lower mAs driving higher noise levels. We derived b1, b2, and the noise magnitude directly from the model and applied them to the ground truth microCT image according to equation (1) to simulate CT reconstructions.

3.3. Radiomics computation

From section 3.1, voxel values in the microCT reconstructions were scaled to values between 0 and 0.02 mm−1. We set the maximum and minimum of the histogram and GLCM to a broader range from −0.1 to 0.1 to encompass all possible voxel values (e.g. with noise) under the various imaging conditions and under each recovery step—i.e. none of the histograms and GLCMs have voxels ‘piled up’ at the extremal bins. Nominally, we set the discretization level, nl, to 512 bins between −0.1 and 0.1 to provide sufficient sampling to approximate a continuous distribution. For the results shown below, the spatial offset for GLCM is set to [0, 1], i.e. between a voxel and its horizontal neighbor to the right. This nearby offset selection was chosen since it represents a challenging scenario where blur and correlated noise are significant. An asymmetric GLCM matrix was computed. Individual radiomic features based on histograms and GLCMs were implemented according to definitions in the open source package pyradiomics (Van Griethuysen et al 2017).

3.4. Optimization of the initial deconvolution (β selection)

For each combination of the blur parameter, c0, and noise parameter, mAs, we optimized β using an exhaustive search over values ranging from 0 to 40. We performed the recovery process for 10 different noise realizations for each β, and the minimum mean root mean square error (RMSE) between the ground truth and recovered histogram was used as the objective function for β selection. The same process is used for GLCM recovery. We found that the optimal β for different images were fairly close, hence the optima obtained for a single image slice in the heterogeneous void phantom was applied to all other images.

3.5. Evaluation of recovery performance

The performance of the recovery method was assessed across imaging conditions (section 3.2) in terms of the normalized RMSE between the ground truth and recovered histogram and GLCM. The normalization factor was chosen to be the total number of elements in the histogram (512) or GLCM (512 × 512). We further computed individual histogram- and GLCM-based radiomic features and compared the values among the ground truth, blurred and noisy, and recovered cases. While the recovery method is applicable to both 2D and 3D images, results presented in this work were performed on 2D image slices within a 511 × 511 ROI.

While the evaluation methods above focuses on the ability to recover ground truth radiomic values, ground truth is rarely available in clinical applications. Rather, radiomics are primarily used to draw inferences or make predictions. Therefore, we emulated a simple classification problem where radiomic features are used to distinguish between two different textures presented in the 1.4 mm and 1.6 mm void texture phantoms (figure 3). We evaluated the effect of radiomic recovery on classification performance when the two phantoms were imaged under different conditions. We identified two radiomic features in the ground truth images that can separate the two textures: cluster tendency and cluster prominence as independent variables. Ten slices were extracted from each phantom to represent the intrinsic variability in each texture category. The slices were sufficiently separated from each other (28 slices apart) to ensure minimal overlap in structural details. We simulated CT blur and noise corresponding to c0 = 1.0, 0.7, 0.5, and mAs = 0.03, 0.1, 1.0, and generated ten noise realizations for each imaging condition to capture statistical variability. The texture features for each slice, imaging condition, and noise realization are pooled together (400 in total) and used for classification. The two texture features were calculated for the ground truth, blurred and noisy, and recovered cases. We fitted a linear discriminant analysis model to classify each case using 5-fold cross validation and computed the classification accuracy for each case for comparison.

Lastly, we evaluated the effect of discretization levels (nl) on recovery performance by progressively reducing nl by half from the nominal level (512 over [−0.1, 0.1]). The histogram/GLCM and derived radiomic values are not directly comparable across different discretization levels. Therefore, we instead compared the recovered distributions to the blurred and noisy ones to illustrate the improvement as a result of the recovery process. We chose a nominal imaging condition, c0 = 1.0, mAs = 1.0, and presented the ratio of the RMSE of recovered to ground truth, to the RMSE of blurred and noisy to ground truth distributions.

4. Results

The results of the investigations on the optimal regularization strength, β, in the initial deconvolution (equation (7)) are summarized in figure 4(a) as a function of the cutoff parameter, c0, and mAs. The system blur in these plots varies as a function of c0, while the noise level varies as a function of both c0 and mAs. The optimal β is small (<1) for a majority of imaging conditions but increases quickly as image noise increases (low mAs, high c0). This is expected due to the greater noise amplification as a result of deconvolution for these high noise cases. The recovery performance in terms of the RMSE between the ground truth and recovered GLCM and histogram are shown in figures 4(b) and (c), respectively. The trends in RMSE mirror that of the optimal β study, where the high noise cases are more challenging to recover.

Figure 4.

Figure 4.

(a) Optimal β as a function of spatial resolution (as controlled by the cutoff parameter, c0) and dose (as controlled by mAs). (b), (c): The RMSE between the ground truth and recovered GLCM and histogram using the optimal β at each imaging condition.

To further illustrate the recovery performance and substantiate the RMSE results, the images, histograms, and GLCMs for select imaging conditions are shown in figure 5. The imaging conditions were selected to showcase a range of blur and noise including the most challenging cases. For the blurry and noisy cases shown in rows 2 and 4, increasing noise broadens the histogram and GLCM while increasing blur narrows them. These histograms and GLCMs are distinct from the ground truth under all imaging conditions. For mAs = 1.0 and 0.1, the recovery method yields histograms and GLCMs close to the ground truth. For the most challenging high noise case at mAs = 0.01, the recovered results managed to achieve similar structure as the ground truth (e.g. two peaks) but are biased due to the large β used in the first deconvolution step. Interestingly, when the resolution is low (c0 = 0.1), the recovered results once again resembles ground truth.

Figure 5.

Figure 5.

The ground truth image, histogram, and GLCM are shown on the left column. Different blur and noise are applied to the ground truth image (row 1), with the corresponding histogram (row 2) and GLCM (row 4), compared to those recovered (row 3 and 5) following methods in section 2.2.

The effect of recovery on radiomic features is shown in figure 6. For each imaging condition, we computed a total of 34 features from both histogram and GLCM estimates over 10 noise realizations and computed their mean absolute percentage difference from the ground truth—i.e. the metric is 0 if the radiomic feature value perfectly aligns with the ground truth. The results are plotted in figure 6(a) for each imaging condition for both the blurry and noisy case and the recovered case. In addition, we plotted the percentage difference for individual radiomic features for the imaging conditions where the median (1.4%, at c0 = 0.5 and mAs = 0.10) and the worst (17.2% at c0 = 0.9 and mAs = 0.01) recovery performance occurs. The error bars represents the standard deviation over 10 noise realizations.

Figure 6.

Figure 6.

(a) The mean absolute percentage difference from ground truth over 34 radiomic features computed from the histogram and GLCM plotted against c0 and mAs. The recovered case (below) outperforms the blurred and noisy case over all imaging conditions. (b) The percentage difference of all radiomic features at the imaging condition corresponding to the median and the worst recovery performance. The error bars represents the standard deviation over 10 noise realizations.

The blurred and noisy case has greater deviations from the ground truth than the recovered case under all imaging conditions, with the best performance of the blurred and noisy case (34.0%) almost twice that of the worse performance in the recovered case (17.2%). Spatial resolution is seen to have a large effect on the blurred and noisy radiomic values, consistent with observations in figure 5 where both histogram and GLCM significantly deviates from the ground truth. Interestingly, at higher c0, the radiomic accuracy improves as noise increases. Increasing noise broadens and flattens the histogram and GLCM as theoretically explained by the (energy preserving) convolution in equation (2) and observed in figure 5. While the distributions are still far from the ground truth, there is a reduction in mean percentage difference driven by improvements in radiomic features that emphasize peak values in the histogram and GLCM, e.g. maximum probability and energy. This behavior can also be seen from individual radiomic values plotted in figure 6(b). On the other hand, the performance for the recovered case largely follows previously observed trends for RMSE in figure 4.

For a closer examination of individual radiomic features, we pooled the radiomic feature values under all imaging conditions and plotted the range of percentage differences from ground truth in figure 7. For the blurred and noisy case, some features including maximum probability, energy, and 10th percetile are especially sensitive to imaging conditions (with maximum andor minimum exceeding the range of the plot) while some are fairly robust (including autocorrelation, joint average, correlation, sum average, and mean). For most radiomic features, the blurred and noisy case exhibit greater range and greater deviation from the ground truth. Comparatively, the recovered values are not only closer to the ground truth, but also show less variability across imaging conditions.

Figure 7.

Figure 7.

The range of radiomic feature values across all imaging conditions. The recovered feature values are not only closer to the ground truth but also exhibit smaller variability across imaging conditions.

We further demonstrate the value of the recovery process in a classification problem between the 1.4 and 1.6 mm void phantoms as shown in figure 3. The ground truth, blurred and noisy, and recovered estimates of cluster tendency is plotted against cluster prominence for 10 image slices in both phantoms in figure 8. The ground truth radiomics form distinct clusters and the LDA achieves 100% classification accuracy for both phantoms. The recovered radiomics, even with biased results at low mAs, are able to achieve 100% accuracy for texture 1 and 99.0% accuracy for texture 2. The blurred and noisy case, on the other hand, shows mixing between the two textures. The classification accuracy is reduced to 93.3% for texture 1 and 94.5% for texture 2.

Figure 8.

Figure 8.

Two radiomic features, cluster tendency and cluster prominence, are used to classify the two different textures present in the 1.4 and 1.6 mm void phantoms shown in figure 3. The two textures are denoted by symbols ‘cross’ and ‘square’, while ground truth, blurred and noisy, and recovered radiomic are separated by colors. The classification accuracy achieved with ground truth radiomic is 100% for textures 1 and 2; blurred and noisy, 93.3% and 94.5%; recovered, 100% and 99.0%.

The last investigation considers the impact of discretization on the performance of the recovery method. Comparisons between the ground truth and recovered histogram and GLCM are shown for different discretization levels, nl, in figure 9. For reference, the nominal nl is 512 in figure 5. By visual comparison, the recovered histogram shows close resemblance to the ground truth across all three nl values. The recovered GLCM retains the structure of the ground truth but the peak values show greater deviation for lower nl values. Comparing the recovered distributions to the blurred and noisy distributions at each nl, we presented the ratio between the RMSE between recovered and ground truth (RMSEr), and RMSE to between blurred and noisy to ground truth (RMSEbn). Similar to visual observation, the recovered histogram has consistent and superior RMSE compared to blurred and noisy histograms at all three nl values (from nl = 512 to 64, the ratio is 0.093, 0.090, 0.089, 0.107). The ratio for GLCM is similarly consistent until the lowest nl where mismatch in the peak GLCM values drives down the RMSE (from nl = 512 to 64, the ratio is 0.048, 0.053, 0.055, 0.434). These results indicate that the recovery process is fairly robust across discretization levels and improves both histogram and GLCM compared to the blurred and noisy case.

Figure 9.

Figure 9.

Effect of discretization levels on recovery performance. We progressively halved the discretization level, nl, and compared the ground truth histogram and GLCM to those recovered.

5. Discussion and conclusion

We have presented a method for standardization of radiomics based on distribution measures including histograms and GLCMs. The method requires as inputs: the measured, blurred and noisy image data, and its spatial resolution and noise properties, which can be obtained through theoretical modeling or empirical measurements. Instead of attempting the challenging task of joint deblurring and denoising to recover the underlying true image, we focus on recovery of the underlying distribution. The method involves two deconvolution steps to undo the effect of blur and noise respectively and yields the recovered histogram and GLCM. The approach was validated using texture phantoms and realistic models of noise and blur across a range of imaging conditions in an emulated CT system. Treating microCT scans of the phantoms as the ground truth, we have demonstrated the ability to effectively recover the histogram, GLCM, as well as derived radiomic features. The recovered radiomic features also improved classification performance compared to unrecovered features computed directly from blurred and noisy images.

In contrast to existing approaches of radiomic standardization that often focus on specific imaging scenarios (e.g. scanners and reconstruction kernels), we have presented a systematic treatment that accounts for the inherent spatial resolution and noise properties of the image data. The theoretical model and recovery method is therefore not limited to specific scanners or imaging parameters, but can be generalized to any local ROIs where assumptions of linearity and shift-invariance holds. In addition, we circumvent the challenging problem of finding the underlying ground truth image by recovering the distributions, which in turn allows any derived features to be recovered as well. The proposed method can potentially complement standardization techniques acting directly on radiomic features, e.g. batch effect removal technique like ComBat (Orlhac et al 2018) that removes center-dependent effects.

There are a number of potential refinements to the proposed methodology. The recovery method contains two deconvolution steps. The first deconvolution in the image domain was performed via direct Fourier inversion so that the resulting noise properties are tractable. While we obtained good performance under a wide range of imaging conditions, high noise cases tend to be more challenging due to noise amplification. Future work will consider more sophisticated iterative deconvolution techniques which may present noise advantages while still allowing one to propagate noise properties (Barrett et al 1994). Similarly, more advanced deconvolution techniques will be investigated for the second deconvolution, e.g. one may include non-negativity constraints based on innate properties of the histogram and GLCM.

This work focused on two classes of radiomics that are based on the distributions of voxel values. There are other classes of radiomics that could potentially leverage the analysis performed here. For example, radiomics based on linear decomposition of the image (e.g. Fourier transform, wavelet transform) may be readily integrated with the existing modeling and recovery methodology by modeling another linear transform between the reconstruction and the radiomic computation steps. Other texture features like gray-level run-length, size zone, and neighborhood gray tone difference matrices are more complex and potentially require different recovery methods. Extending the modeling and recovery framework to wider classes of radiomics is the subject of ongoing investigation.

This work has adopted a somewhat idealized image property model. For example, we demonstrated the recovery process when the system blur is energy-preserving—i.e. no bias exists between the ground truth and the blurred and noisy image. Variability in kV across data can affect the attenuation coefficient in the reconstruction and result in biases of the voxel values, hence radiomic features. This can potentially be mitigated through a calibration method analogous to beam hardening correction. In addition, all images in this work have the same voxel size. Voxel size differences are known to affect radiomic values and several studies have looked into the resampling strategies to address this issue (Shafiq-ul Hassan et al 2017b, Larue et al 2017a, Mackin et al 2017). The effect of sampling can potentially be integrated within the proposed recovery framework, where the image content is resampled while noise is propagated through the appropriate aperture function and interpolation filters.

While the investigations in this work have shown excellent performance in recovering underlying radiomics, we expect that there are some imaging conditions where the deconvolution cannot restore critical image features. For example, in the case of coarser sampling or blur functions with significant nullspaces where image frequency content is never measured by the imaging system. In this case, it may be necessary to adapt the proposed methodology to perform an incomplete recovery—in effect, restoring the image to a common baseline that is higher spatial resolution than the original data, but somewhat less than the true underlying content. Such standardization to a common baseline is the subject of ongoing investigations and we hope to report strategies under these conditions in the future.

Another important aspect of ongoing work is evaluating our strategies in real data acquired from clinical CT scanners. As detailed in section 2.2, the proposed methods requires b1 and b2 to be known a priori. For experimental imaging systems where we have full knowledge of system parameters, this can be achieved through fully theoretical modeling established in previous work (Tward and Siewerdsen 2008, Gang et al 2014). For clinical systems where some parameters may be proprietary information, we will rely on a combination of empirical measurements (using conventional methods for MTF and NPS phantoms) and theoretical modeling (e.g. of focal spot blurs, attenuation characteristics through patient anatomy). Such implementation details will be reported in future work. Furthermore, we plan to investigate comparison and possibly combining our proposed strategy with statistical methods like ComBat (Orlhac et al 2018). Such investigations are best performed on a large, potentially multi-site dataset acquired with a variety of scanners and imaging protocols, and will be subjects of future work.

In summary, we have proposed a novel radiomics standardization framework to model and recover the effect of spatial resolution and noise on radiomic classes based on distributions of pixel values. The methods and results presented lay the groundwork for a general strategy to mitigate radiomics variability due to the imaging systems, and form a critical step towards building robust, repeatable, and reproducible radiomic models. The work has potential to standardize radiomics across a broad range of imaging conditions and to help facilitate inference and computer aided diagnosis in increasingly diverse and large image databases.

Acknowledgments

This work is supported, in part, by NIH grants R21CA219608 and R01CA249538.

References

  1. Aerts HJWL, Velazquez ER, Leijenaar RTH, Parmar C, Grossmann P, Carvalho S, Bussink J, Monshouwer R, Haibe-Kains B and Rietveld D 2014. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach Nat. Commun 5 4006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Avanzo M, Stancanello J and El Naqa I 2017. Beyond imaging: the promise of radiomics Phys. Med 38 122–39 [DOI] [PubMed] [Google Scholar]
  3. Barrett HH, Wilson DW and Tsui BM 1994. Noise properties of the em algorithm: I. Theory Phys. Med. Biol 39 833–46 [DOI] [PubMed] [Google Scholar]
  4. Choe J, Lee SM, Do K-H, Lee G, Lee J-G, Lee SM and Seo JB 2019. Deep learning-based image conversion of ct reconstruction kernels improves radiomics reproducibility for pulmonary nodules or masses Radiology 292 365–73 [DOI] [PubMed] [Google Scholar]
  5. Gang GJ, Deshpande R and Stayman JW 2021. End-to-end modeling for predicting and estimating radiomics: application to gray level co-occurrence matrices in CT Medical Imaging 2021: Physics of Medical Imaging (International Society for Optics and Photonics, SPIE; ) [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Gang GJ, Lee J, Stayman JW, Tward DJ, Zbijewski W, Prince JL and Siewerdsen JH 2011. Analysis of fourier-domain task-based detectability index in tomosynthesis and cone-beam ct in relation to human observer performance Med. Phys 38 1754–68 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Gang GJ, Stayman JW, Zbijweski W and Siewerdsen JH 2014. Task-based detectability in CT image reconstruction by filtered backprojection and penalized likelihood estimation Med. Phys 41 081902. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Gang GJ, Zbijewski W, Webster Stayman J and Siewerdsen JH 2012. Cascaded systems analysis of noise and detectability in dual-energy cone-beam ct Med. Phys 39 5145–56 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Ger RB, Zhou S, Chi P-CM, Lee HJ, Layman RR, Jones AK, Goff DL, Fuller CD, Howell RM and Li H 2018. Comprehensive investigation on controlling for CT imaging variabilities in radiomics studies Sci. Rep 8 1–14 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Hogg RV, McKean J and Craig AT 2005. Introduction to Mathematical Statistics (Upper Saddle River, NJ: Pearson Education; ) [Google Scholar]
  11. Isensee F, Kickingereder P, Wick W, Bendszus M and Maier-Hein KH 2017. Brain tumor segmentation and radiomics survival prediction: Contribution to the brats 2017 challenge International MICCAI Brainlesion Workshop (Berlin: Springer; ) pp 287–97 [Google Scholar]
  12. Kim H, Park CM, Gwak J, Hwang EJ, Lee SY, Jung J, Hong H and Goo JM 2019. Effect of CT reconstruction algorithm on the diagnostic performance of radiomics models: a task-based approach for pulmonary subsolid nodules Am. J. Roentgenol 212 505–12 [DOI] [PubMed] [Google Scholar]
  13. Kim H et al. 2010. A computer-aided diagnosis system for quantitative scoring of extent of lung fibrosis in scleroderma patients Clin. Exp. Rheumatol 28 S26–S35 [PMC free article] [PubMed] [Google Scholar]
  14. Kolossváry M, Kellermayer M, Merkely B and Maurovich-Horvat P 2018. Cardiac computed tomography radiomics J. Thoracic Imaging 33 26–34 [DOI] [PubMed] [Google Scholar]
  15. Kumar V, Gu Y, Basu S, Berglund A, Eschrich SA, Schabath MB, Forster K, Aerts HJWL, Dekker A and Fenstermacher D 2012. Radiomics: the process and the challenges Magn. Reson. Imaging 30 1234–48 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Lambin P, Leijenaar RTH, Deist TM, Peerlings J, De Jong EEC, Van Timmeren J, Sanduleanu S, Larue RTHM, Even AJG and Jochems A 2017. Radiomics: the bridge between medical imaging and personalized medicine Nat. Rev. Clin. Oncol 14 749–62 [DOI] [PubMed] [Google Scholar]
  17. Larue RT et al. 2017a. Influence of gray level discretization on radiomic feature stability for different ct scanners, tube currents and slice thicknesses: a comprehensive phantom study Acta Oncol. 56 1544–53 [DOI] [PubMed] [Google Scholar]
  18. Larue RTHM, van Timmeren JE, de Jong EEC, Feliciani G, Leijenaar RTH, Schreurs WMJ, Sosef MN, Raat FHPJ, van der Zande FHR and Das M 2017b. Influence of gray level discretization on radiomic feature stability for different CT scanners, tube currents and slice thicknesses: a comprehensive phantom study Acta Oncol. 56 1544–53 [DOI] [PubMed] [Google Scholar]
  19. Mackin D, Fave X, Zhang L, Yang J, Jones AK, Ng CS and Court L 2017. Harmonizing the pixel size in retrospective computed tomography radiomics studies PLoS One 12 e0178524. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Mackin D, Ger R, Gay S, Dodge C, Zhang L, Yang J and Jones AK 2019. Matching and homogenizing convolution kernels for quantitative studies in computed tomography Investigative Radiol. 54 288–95 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Nie K, Shi L, Chen Q, Hu X, Jabbour SK, Yue N, Niu T and Sun X 2016. Rectal cancer: assessment of neoadjuvant chemoradiation outcome based on radiomics of multiparametric MRI Clin. Cancer Res 22 5256–64 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Orlhac F, Boughdad S, Philippe C, Stalla-Bourdillon H, Nioche C, Champion L, Soussan M, Frouin F, Frouin V and Buvat I 2018. A postreconstruction harmonization method for multicenter radiomic studies in pet J. Nucl. Med 59 1321–8 [DOI] [PubMed] [Google Scholar]
  23. Rizzo S, Botta F, Raimondi S, Origgi D, Fanciullo C, Morganti AG and Bellomi M 2018. Radiomics: the facts and the challenges of image analysis Eur. Radiol. Exp 2 1–8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Shafiq-ul Hassan M, Zhang GG, Hunt DC, Latifi K, Ullah G, Gillies RJ and Moros EG 2017a. Accounting for reconstruction kernel-induced variability in ct radiomic features using noise power spectra J. Med. Imaging 5 011013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Shafiq-ul Hassan M et al. 2017b. Intrinsic dependencies of ct radiomic features on voxel size and number of gray levels Med. Phys 44 1050–62 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Shi H, Gang G, Li J, Liapi E, Abbey C and Stayman JW 1919. Performance assessment of texture reproduction in high-resolution CT Medical Imaging 2020: Image Perception, Observer Performance, and Technology Assessment vol 11 316 ed Samuelson FW and Taylor-Phillips S (SPIE-Intl Soc Optical Eng; ) p 25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Siewerdsen J, Waese A, Moseley D, Richard S and Jaffray D 2004. Spektr: a computational tool for x-ray spectral analysis and imaging system optimization Med. Phys 31 3057–67 [DOI] [PubMed] [Google Scholar]
  28. Solomon J, Mileto A, Ramirez-Giraldo JC and Samei E 2015. Diagnostic performance of an advanced modeled iterative reconstruction algorithm for low-contrast detectability with a third-generation dual-source multidetector ct scanner: potential for radiation dose reduction in a multireader study Radiology 275 735–45 [DOI] [PubMed] [Google Scholar]
  29. Tward DJ and Siewerdsen JH 2008. Cascaded systems analysis of the 3d noise transfer characteristics of flat-panel cone-beam ct Med. Phys 35 5510–29 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Van Griethuysen JJ, Fedorov A, Parmar C, Hosny A, Aucoin N, Narayan V, Beets-Tan RG, Fillion-Robin J-C, Pieper S and Aerts HJ 2017. Computational radiomics system to decode the radiographic phenotype Cancer Res. 77 e104–7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Weber AG. The USC-SIPI image database version 5 USC-SIPI Report 315. University of Southern California; 1997. [Google Scholar]
  32. Zhao B, Tan Y, Tsai W-Y, Qi J, Xie C, Lu L and Schwartz LH 2016. Reproducibility of radiomics for deciphering tumor phenotype with imaging Sci. Rep 6 1–17 [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Zhao Z, Gang G and Siewerdsen J 2014. Noise, sampling, and the number of projections in cone-beam ct with a flat-panel detector Med. Phys 41 061909. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Zhovannik I, Bussink J, Traverso A, Shi Z, Kalendralis P, Wee L, Dekker A, Fijten R and Monshouwer R 2019. Learning from scanners: Bias reduction and feature correction in radiomics Clin. Transl. Radiat. Oncol 19 33–8 [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES