Abstract
This paper uses a computer simulation to investigate whether a more accurate noise model always results in less noisy images in CT iterative reconstruction. We start with a hypothetic non-realistic noise model for the CT measurements, by assuming that the attenuation coefficient is energy independent and there is no scattering. A variance formula for this model is derived and presented. Based on this model, computer simulations are conducted with 12 different ad hoc noise weighting methods, and their results are compared. The simple Poisson noise model performs better than other more accurate models, when the projection data are generated with the hypothetical noise model. A more accurate noise model does not necessarily produce a less-noisy image. In this counter example, modeling the system’s electronic noise during reconstruction does not help reducing the image noise. A simpler noise model sometimes can outperform the complicated and more accurate noise model.
Index Terms: Image reconstruction, noise model, Poisson distribution, X-ray CT
I. Introduction
In transmission tomography especially in x-ray computed tomography (CT), the iterative image reconstruction algorithms assume a simple Poisson model [1]–[5]. In this Poisson noise model, it is assumed that the number of photons I0 emitted from the x-ray tube is a constant (not random) because this number is extremely large. After the x-ray photons travel through an attenuating/scattering object, the number of photons escaping from the object is significantly reduced and follows the Poisson noise model.
In fact, the number of photons emitted from the x-ray tube is also characterized as the Poisson distribution [6]. This more accurate noise model is justified in multi-energy x-ray CT imaging, where the x-ray energy spectrum is considered. The x-ray energy spectrum can be divided into many sub energy windows (or bins); the number of x-ray photons in each window is Poisson. The mean value of the photons in each energy window can be characterized by a spectrum function Φ(E) [7]. Since the source photon counts are random and Poisson, the detected (after object) x-ray photons are compound Poisson distributed [8], [9] and its probability density function depends on the convolution of the spectrum function Φ(E). A simple Poisson model may not be accurate enough to exactly model the photon noise.
The purpose of this paper is to investigate whether a more accurate noise model always gives a better (e.g., less noisy with the same contrast) image. A hypothetical noise model is set up in this paper, and it is referred to as the “true model.” However, the exact noise variance for this true model is difficult to find. Many approximate models are presented and used for comparison purposes. It is well known that the linear attenuation coefficient of a material is energy dependent and this fact is the cause of the infamous beam-hardening artifacts in CT. The beam-hardening issues are not in the scope of this paper, and the linear attenuation coefficient is assumed to be energy independent in all of our computer simulations.
II. Methods
A. The Hypothetical Noise Model
The x-ray source emits I0 photons per projection ray. Since I0 is so large, it is justified to assume I0 to be a non-random constant. Let the energy spectrum distribution function of the x-ray source be Φ(E) which is normalized to unity (similar to a probability density function). A typical distribution Φ(E) is shown in Fig. 1.
Fig. 1.
Energy spectrum Φ(E) of an x-ray source.
The x-ray energy spectrum (before entering the object) can be divided into many sub energy windows (or bins); the number of x-ray photons in each window is Poisson distributed. The mean value of the number of x-ray photons in the kth energy window is I0Φ(Ek).
After the x-rays pass through the object, on the detector, each energy window produces a Poisson x-ray measurement pk, which can be modeled as
| (1) |
where μ(x) is the linear attenuation coefficient and is assumed to be independent of the photon energy in this paper; the line integral ∫ μ(x)dx is along the associated projection ray. Eq. (1) is essentially the Beer’s law [11].
If a random variable q follows the Poisson distribution, we symbolically denote it as
| (2) |
where λ is the mean value as well as the variance. Then (1) is a special case of
| (3) |
where a1 and a2 are two non-random constants. The mean and variance of the random variable p can be derived from the law of total expectation and the law of total variance [10]. Thus
| (4) |
| (5) |
For the random variable pk in (1), The energy-integrating detection outputs a signal of aEk pk, where Ek is the photon energy of the energy window and a is the system gain. Using the definition (3) with a1 = aEk, a2 = exp(− ∫ μ(x)dx), and λ = I0Φ(Ek), by (4) and (5), the mean and variance of the detected energy from pk are given as:
| (6) |
| (7) |
The total signal p received by the energy-integrating system along this ray should include the signals from all energy windows and the system noise d generated in the electronic circuits:
| (8) |
where d can be assumed to be a zero-mean Gaussian random variable with a variance σ2. Using (6) and (7), the mean and variance of the random variable p in (8) can be obtained as
| (9) |
with the first moment of the x-ray source distribution
| (10) |
Variance of
| (11) |
with the second moment
| (12) |
The entire paper assumes that the attenuation coefficient is independent on beam energy. Without this assumption the mathematical derivation in this paper is incorrect.
B. Post-Log Multi-Energy Noise
The mean and variance formulas in Section 2.1 are for the pre-log data. Normally, the post-log data are used for analytical and iterative reconstruction. Let the post-log data be obtained as
| (13) |
In (13), if p is less than 1, p is set to 1 before taking the logarithm. The exact variance of y is not easy to calculate; an approximation of it can be obtained as
| (14) |
The approximation in the first line of (14) is based on the truncated Taylor expansion of (13). The approximation error can be large when the quanta are low.
The Gauss–Markov theorem [12], [13] shows that in a linear regression model in which the errors have expectation zero and are uncorrelated and have equal variances, the best linear unbiased estimator of the coefficients is given by the ordinary least squares estimator. Here “best” means giving the lowest variance of the estimate, as compared to other unbiased, linear estimators. The errors do not need to be Gaussian, nor do they need to be independent and identically distributed. If, however, the measurements have different uncertainties, Aitken showed that each weight should be equal to the reciprocal of the variance of the measurement [14]. Thus the optimal noise weighting in this case is the reciprocal of the noise variance:
| (15) |
Hereafter, the noise weighting (15) will be referred to as the mathematical noise model weighting. However, in practice, the mean value ȳ of the post-log data is not available. The one-time measurement y is usually used instead.
C. Computer Simulations
A gradient descent algorithm will be used to minimize the noise-weighted post-log maximum likelihood objective function:
| (16) |
where μi is the image pixel whose value is the linear attenuation coefficient, yj is the post-log measurement (i.e., the noisy Radon transform), aij is the contribution from the ith pixel μi to the jth measurement yj, and wj is the weighting factor which is chosen to be the reciprocal of the noise variance of yj. The gradient descent algorithm is expressed as
| (17) |
where is the estimation of μi at the kth iteration and the step size α is a small positive constant to prevent the algorithm from divergence. The value of α is set to 0.2 in this paper. The purpose of the denominator is to normalize the weighting factors wj so that the step size of the algorithm is almost the same for any chosen weighting factor wj > 0. The summation over the index n is the projector and the summation over the index j is the backprojector.
The computer simulations in this paper are based on a scaled-down x-ray CT fan-beam imaging geometry with a curved detector. The energy spectrum Φ(E) was provided by Toshiba and shown in Fig. 1. The image array was 256 × 256, the pixel size was 1.52 mm ×1.52 mm, the number of views was 400 over 360°, the number of detection channels was 400, the distance from the x-ray focal spot to the isocenter was 240 mm, and the virtual detector is at the isocenter. The x-ray source flux had I0 = 104 counts per ray, which corresponds to a low-dose imaging setup. The phantom shown in Fig. 2 is a 355 mm ×187 mm ellipse with water background (μ = 0.02/mm), 3 high contrast regions (μ = 0.032/mm) with diameter 48 mm, 2 low-contrast regions (μ = 0.0194mm) with diameter 36 mm, surrounded by outer layers of fat (μ = 0.019/mm) and skin (μ = 0.021/mm), in which ROI 1 (high contrast object) and ROI 2 (water background) are used to evaluate the image quality. The detailed phantom parameters are listed in Table 1.
Fig. 2.
A computer generated torso phantom is used for transmission CT data generation. The linear attenuation coefficients are labeled in the unit of per mm. Two regions of interest (ROIs) are defined for image evaluation and for the image contrast calculation.
TABLE I.
Parameters of the phantom (in MATLAB phantom format)
| X0 (center) |
y0 (center) |
A (semi- axis) |
B (semi- axis) |
ϕ (rotation) |
HU |
|---|---|---|---|---|---|
| 0.0 | −11.2 | 59.2 | 31.2 | 0 | 1050 |
| 0.0 | −11.2 | 58.4 | 30.4 | 0 | −100 |
| 0.0 | −12.0 | 56.0 | 28.0 | 0 | 50 |
| 40.0 | −12.0 | 8.0 | 8.0 | 0 | 600 |
| −40.0 | −12.0 | 8.0 | 8.0 | 0 | 600 |
| 0.0 | −24.0 | 8.0 | 8.0 | 0 | 600 |
| 24.0 | −20.0 | 6.0 | 6.0 | 15 | −30 |
| −24.0 | −20.0 | 6.0 | 6.0 | −15 | −30 |
Noisy projection data generation procedure is as follows using MATLAB (The MathWorks, Inc., Natick, NA, USA). First, for each ray, a noiseless line integral of the phantom is calculated. Second, using exponential function, the line integral value is converted into noiseless transmission counts. Third, the noiseless transmission data at each energy bin is incorporated with nested Poisson noise according to (1). Random number was generated by MATLAB’s built-in noise Poisson generator — poissrnd. Forth, the summation of the noisy data from all energy bins is calculated and the system Gaussian noise is added to the summation as (8). Fifth, this noisy pre-log value is transformed into noisy post-log value according to (13) for image reconstruction. The parameter a was set to 0.1 and σ was set to 6.3. The value of 6.3 is a typical value in a Toshiba CT system and was obtained from experimental measurements.
In the entire data generation and reconstruction, the attenuation coefficients were assumed to be energy independent. The iterative algorithm was implemented according to (17), and the iterative algorithm stops when a pre-specified image contrast is reached. This value is set up for 2 cases: (ROI1 − ROI2)/ROI2 ≥ 0.55 and (ROI1 − ROI2)/ROI2 ≥ 0.60, respectively. The reconstructed images with different methods are compared with the normalized standard deviation value in ROI 2. The normalized standard deviation value is the standard deviation value divided by the mean value.
This paper will use 12 methods of implementing and approximating the weighting factor wj, which is the reciprocal of the noise variance sinogram. These 12 methods of calculating the noise weighting are listed below.
Method 1 (Mathematical)
The weighting factor wj is calculated using (15), with a = 0.1, I0 = 104, σ2 = 6.32, and ȳj being the noiseless post-log projection. The x-ray tube energy distribution function Φ(E) is measured in a Toshiba CT scanner as shown in Fig. 1. The energy spectrum is subdivided into 120 energy windows from 0 to 120 keV. The first and second moments of the spectrum Ē and E̿ are calculated using (10) and (12); their numerical values are 67.8776 keV and 5036.9 (keV)2, respectively.
Method 2 (Ignoring electronic noise)
This method is the same as Method 1, except that the electronic noise is ignored during image reconstruction. In other words, σ2 = 6.32 is used in data generation, but σ2 = 0 is assumed in reconstruction.
Method 3 (Statistical approach)
This method uses 1000 realizations of the noisy post-log data sets. The ensemble variance “sinogram” is then calculated from these 1000 realizations, and the weighting function is the reciprocal of the variance “sinogram.” We believe that this method is the most accurate one among our 12 methods, while Method 1 may not be accurate when the line integral value is large.
Method 4 (Practical approach)
Almost the same as Method 1, except that the mean value ȳj is replaced by one-time (i.e., one noise-realization) post-log measurement yj. Electronic noise is modeled by σ2.
| (18) |
Method 5 (Modified version of Method 1)
We replace the variance by a power function of the variance.
| (19) |
Method 6 (Popular model)
This is a simplified version of (18). The expression is
| (20) |
Method 7 (Practical Poisson approximation)
Instead of using the more accurate true model, this method uses the morepractical (but less accurate) Poisson model and ignores electronic noise, which leads to the variance sinogram being exp(ȳj). To make it practical, the mean value ȳj is replaced by one-time measurement value yj.
| (21) |
Method 8 (Constant weights, i.e., No weights)
The variance sinogram is set to a constant for all projection rays.
| (22) |
Method 9 (Totally wrong)
This will show what damages wrong weighting factors can do to the reconstruction. The totally wrong variance sinogram is chosen to be exp(−yj).
| (23) |
Method 10 (Method 7 with the noiseless y)
This is more accurate than Method 7, where ȳ is the noiseless true value.
| (24) |
Method 11 (Method 10, plus a power γ)
This method uses the same idea of Method 5, but uses the model in Method 10:
| (25) |
Method 12 (Method 11, 1 realization, 5-point average)
Method 11 is not practical, because the noiseless measurement is never available. On the other hand, the one realization measurement is too noisy. This method replaces ȳ in (25) by a 5-point running averaged (i.e., lowpass filtered) value of y in the detector channel direction. This method is practical.
III. Results
A. Verification of Eq. (14)
The theoretical variance formula (14) for the noise (8) is verified by ensemble variance of 1000 noise realizations, with a = 0.1, I0 = 104, and σ2 = 6.32. The results are summarized in Fig. 3, using various values of ȳ = ∫ μ(x)dx. It is observed from Fig. 3 that when the line-integrals ȳ = ∫ μ(x)dx are small, the variance formula (14) and the ensemble variance agree quite well. However, when ȳ becomes large, the number of photons I0 exp(−ȳ) is extremely small and can be zero. It is numerically unstable to take the logarithm of a value that is close to zero. Therefore, the noise variance estimation is unreliable when the line-integrals of the object are large (e.g., when the object contains high density materials).
Fig. 3.
Evaluation of the noise variance given by Eq. (14) by using 1000 noise realizations
B. Iterative Reconstruction
The reconstructed images with all 12 methods are shown in Fig. 4 and Fig. 5 for contrast 0.55 and 0.60, respectively. A rectangular uniform region-of-interest, ROI 2 (shown in Fig. 2), is selected to compare the normalized noise standard deviations in the reconstructed image. The noise weighting methods are compared in terms of the noise in ROI 2, and the results are listed in Table 2. The noise in the image with Method 9 is the worst, as expected. Surprisingly, the simple exponential method with 5-point running average lowpass filter and a power γ (Method #12) outperforms all other methods.
Fig. 4.
Reconstructed images with Methods 1 12, using a stopping rule that a pre-specified image contrast of 0.55 is reached.
Fig. 5.
Reconstructed images with Methods 1 12, using a stopping rule that a pre-specified image contrast of 0.60 is reached.
TABLE II.
Ranking of the methods in terms of ROI 2 normalized standard deviation
| Rank | Method |
|---|---|
|
| |
| 1 (best) | Method 12 (Exponential, 1 noise realization, 5-point running average) |
|
| |
| 2 | Method 11 (Exponential, with a power γ, noiseless weight) |
|
| |
| 3 | Method 5 (Mathematical with a power γ) |
|
| |
| 4 | Method 2 (Mathematical, ignoring electronic noise) |
|
| |
| 5 | Method 1 (Mathematical) [According to Fig. 4] |
| Method 10 (Exponential, noiseless weight) [According to Fig. 5] | |
|
| |
| 6 | Method 10 (Exponential, noiseless weight) [According to Fig. 4] |
| Method 1 (Mathematical) [According to Fig. 5] | |
|
| |
| 7 | Method 7 (Exponential, 1 noise realization) |
|
| |
| 8 | Method 3 (1000 noise realization) [According to Fig. 4] |
| Method 8 (Constant weight) [According to Fig. 5] | |
|
| |
| 9 | Method 8 (Constant weight) [According to Fig. 4] |
| Method 4 (Mathematical, 1 noise realization) [According to Fig. 5] | |
|
| |
| 10 | Method 6 (Popular) [According to Fig. 4] |
| Method 3 (1000 noise realization) [According to Fig. 5] | |
|
| |
| 11 | Method 4 (Mathematical, 1 noise realization) [According to Fig. 4] |
| Method 6 (Popular) [According to Fig. 5] | |
|
| |
| 12 (worst) | Method 9 (Totally wrong) |
It is interesting to observe that the most accurate model using 1000 noise realizations (Method 3) does not give the best image. The result is almost the same if the electronic noise is ignored (Method 1 vs. Method 2). Method 1 and Method 2 are not practical, because the true noise variance cannot be obtained by one noise realization in practice. In reality, the noise variance is estimated by measured data (i.e., one noise realization) and this estimation is noisy. The noise from the weighting function propagates into the reconstruction, generating a noisier image (Method 4) than the ensemble approach (Method 3). Method 7 that approximates the mathematical noise model by simple Poisson and ignores electronic noise generates better result than Method 4 that uses a more accurate noise model.
Noise in the weighting function can contribute to the noise in the reconstruction. This can also be observed by comparing the results of Method 7 and Method 10. The most accurate noise method (Method 3) does not necessary give the best image. In fact, the most accurate noise model, as well as many other approximate methods (Methods 1, 2, 4, 6, 7, and 10), gives low-frequency shadowing artifacts.
The weighting functions with power (γ < 1) (Methods 5, 11 and 12) do not show any low-frequency shadowing artifacts. When γ is small (method 5 with γ = 0.2), the weighting difference between different projection bins get smaller and its image quality is closer to uniform weighting (Method 8) where streaking noise artifacts starts to appear. A moderate γ = 0.4 to 0.5 in Methods 11 and 12 seems to be a good compromise between low-frequency shadowing and streaking noise artifacts.
A much simpler model (Method 12) outperforms all other methods. Method 12 is a practical method that uses a lowpass filtered version of the measurement to reduce the noise in the weighting function. The filter is a 5-point running average of the noisy data. We must point out that the smoothed data are only used for forming the weighting function wj in (17); the projections yj in (17) are un-smoothed.
Figure 6 illustrates that as the iteration number increases, the image contrast increases. Figure 7 illustrates that as the iteration number increases, the image noise (in terms of normalized standard deviation in a constant region) increases.
Fig. 6.
Contrast vs. Iteration number for (Upper) Method #1 and (Lower) Method 7, respectively. The contrast is defined as (ROI1-ROI2)/ROI2.
Fig. 7.
Normalized standard deviation in ROI 2 vs. Iteration number for (Upper) Method #1 and (Lower) Method 7, respectively.
IV. Conclusions
Computer simulations in Section 2 show that almost the same noise variance results from the more accurate 1000-realization noise model and from the less accurate Poisson noise model.
This paper investigates whether the more accurate noise model always gives better (meaning: less noisy) images than the less accurate Poisson noise model. This paper gives a counter example. A more accurate noise model does not necessarily give a better image.
Computer simulations using Methods 1, 2, 3, 4, 6, 7, and 10 give similar results, even though they are ad hoc. This implies that it is reasonable to approximate the more accurate noise model by the less accurate Poisson model. Another observation is that the use of the one-time measurement to approximate the mean value of the measurement can introduce some noise to the output image. Smoothing the projection measurements may be a remedy when they are used to form the weighting factors.
Methods 1, 2, 3, 4, 6, 7, and 10 can generate some low-frequency shadowing artifacts. If we modify the weighting function by a power function of the weighting function this low-frequency shadowing artifacts can be reduced without degrading the image contrast, see Methods 5, 11, and 12. A thorough study of the weighting function with an exponent parameter is outside the scope of this paper, and will be conducted in a different paper.
Finding the optimal maximum likelihood solution was well established a long time ago. By assuming uncorrelated noise model, the optimal noise weighting is the reciprocal of the noise variance. However, this principle does not apply in medical imaging because the maximum likelihood solution usually is too noisy to be useful. In practice, the iterative algorithms terminate early before convergence is reached. If we stop the algorithm early not to search for the noisy “maximum likelihood” solution, the strategy of using the reciprocal of the noise variance as the noise weighting factor may not be optimal. It is still an open problem how to select the noise weighting factor that can lead to the “minimum noise” solution with a pre-specified image contrast.
The noise weighting does matter. The weighting functions can be categorized in 4 types:
Over weighted: Some low-frequency shadowing artifacts can be seen. Many commonly used weighting functions are tend to be over weighted.
Properly weighted: Images have the least artifacts and lowest noise. An over weighted weighting function can be tuned down to a properly weighted weighting function by using an exponential factor γ that is less than 1.
Under weighted: Almost like no weighting at all. Some streaking artifacts can be seen. One could improve the weighting effectiveness by using an exponential factor γ that is greater than 1.
Wrongly weighted: The weighting is in the wrong direction. It emphasizes the rays that should be deemphasized.
Acknowledgments
This work was supported in part by NIH Grant R01HL108350.
Footnotes
Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.
Contributor Information
Gengsheng. L. Zeng, Department of Engineering, Weber State University, Ogden, UT 84408 USA and also with the Department of Radiology and Imaging Sciences, University of Utah, Salt Lake City, UT 84108 USA (larryzeng@weber.edu; larry.zeng@hsc.utah.edu)..
Wenli Wang, Toshiba Medical Research Institute USA Inc., Vernon Hills, IL 60061 USA and also with Columbia University Medical Center, New York, NY 10032 USA..
References
- 1.Lange K, Carson R. EM reconstruction algorithms for emission and transmission tomography. J. Comput. Assist. Tomogr. 1984;8(2):306–316. [PubMed] [Google Scholar]
- 2.Lange K, Bahn M, Little R. A theoretical study of some maximum likelihood algorithms for emission and transmission tomography. IEEE Trans. Med. Imag. 1987 Jun.6(2):106–114. doi: 10.1109/TMI.1987.4307810. [DOI] [PubMed] [Google Scholar]
- 3.Browne JA, Holmes TJ. Developments with maximum likelihood X-ray computed tomography. IEEE Trans. Med. Imag. 1992 Mar.11(1):40–52. doi: 10.1109/42.126909. [DOI] [PubMed] [Google Scholar]
- 4.Fessler JA. Statistical image reconstruction methods for transmission tomography. In: Sonka M, Fitzpatrick JM, editors. Handbook of Medical imaging: Medical Image Processing and Analysis. Vol. 2. Bellingham, WA, USA: SPIE; 2000. pp. 1–70. [Google Scholar]
- 5.Buzug TM. Computed Tomography: From Photon Statistics to Modern Cone-Beam CT. Berlin, Germany: Springer-Verlag; 2010. [Google Scholar]
- 6.Barrett HH, Myers KJ. Foundations of Imaging Science. Hoboken, NJ, USA: Wiley; 2004. pp. 1101–1104. [Google Scholar]
- 7.O’Sullivan JA, Benac J. Alternating minimization algorithms for transmission tomography. IEEE Trans. Med. Imag. 2007 Mar.26(3):283–297. doi: 10.1109/TMI.2006.886806. [DOI] [PubMed] [Google Scholar]
- 8.Lasio GM, Whiting BR, Williamson JF. Statistical reconstruction for X-ray computed tomography using energy-integrating detectors. Phys. Med. Biol. 2007;52(8):2247–2266. doi: 10.1088/0031-9155/52/8/014. [DOI] [PubMed] [Google Scholar]
- 9.Whiting BR, Massoumzadeh P, Earl OA, O’Sulivan JA, Snyder DL, Williamson JF. Properties of preprocessed sinogram data in X-ray computed tomography. Med. Phys. 2006;33(9):3290–3303. doi: 10.1118/1.2230762. [DOI] [PubMed] [Google Scholar]
- 10.Weiss NA. A Course in Probability. Boston, MA, USA: Addison-Wesley; 2005. pp. 380–383. [Google Scholar]
- 11.Hsieh J. Computed Tomography. Bellingham, WA, USA: SPIE Press; 2003. [Google Scholar]
- 12.Plackett RL. Some theorems in least squares. Biometrika. 1950;37(1–2):149–157. [PubMed] [Google Scholar]
- 13.Zyskind G, Martin FB. On best linear estimation and general Gauss-Markov theorem in linear models with arbitrary nonnegative covariance structure. SIAM J. Appl. Math. 1969;17(6):1190–1202. [Google Scholar]
- 14.Altken AC. On least squares and linear combinations of observations. Proc. Roy. Soc. Edinburgh. 1935;55:42–48. [Google Scholar]
- 15.Fan Y, Zamyatin AA, Nakanishi S. Noise simulation for low-dose computed tomography. Proc. IEEE Nucl. Sci. Symp. Med. Imag. Conf. (NSS/MIC) 2012 Oct.:3641–3643. [Google Scholar]







