Abstract
Computed tomography (CT) is widely used to diagnose many diseases. Low-dose CT has been actively pursued to lower the ionization radiation risk. A relatively smoother kernel is typically used in low-dose CT to suppress image noise, which may sacrifice spatial resolution. In this work, we propose a texture transformer network to simultaneously reduce image noise and improve spatial resolution in CT images. This network, referred to as Texture Transformer for Super Resolution (TTSR), is a reference-based deep-learning image super-resolution method built upon a generative adversarial network (GAN). The noisy low-resolution CT (LRCT) image and the routine-dose high-resolution (HRCT) image are severed as the query and key in a transformer, respectively. Image translation is optimized through deep neural network (DNN) texture extraction, correlation embedding, and attention-based texture transfer and synthesis to achieve joint feature learning between LRCT and HRCT images for super-resolution CT (SRCT) images. To evaluate SRCT performance, we use the data from both simulations of the XCAT phantom program and the real patient data. Peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), and feature similarity (FSIM) index are used as quantitative metrics. For comparison of SRCT performance, the cubic spline interpolation, SRGAN (a GAN super-resolution with an additional content loss), and GAN-CIRCLE (a GAN super-resolution with cycle consistency) were used. Compared to the other two methods, TTSR can restore more details in SRCT images and achieve better PSNR, SSIM, and FSIM for both simulation and real-patient data. In addition, we show that TTSR can yield better image quality and demand much less computation time than high-resolution low-dose CT images denoised by block-matching and 3D filtering (BM3D) and GAN-CIRCLE. In summary, the proposed TTSR method based on texture transformer and attention mechanism provides an effective and efficient tool to improve spatial resolution and suppress noise of low-dose CT images.
Keywords: CT super-resolution, texture transformer super-resolution (TTSR), generative adversarial network (GAN), GAN with cycle-consistency (GAN-CIRCLE), low-dose CT
I. INTRODUCTION
Computed tomography (CT) is a common technique in modern medicine with millions of exams each year. However, due to the technical limitations of clinical scanners, CT images with typical resolution of millimeter or submillimeter are hard to resolve structures on an order of tens of microns for certain physiological and pathological applications [1], e.g., coronary artery analysis [2]. High-resolution CT (HRCT) can be done through hardware innovation, such as using smaller detector elements and pitches. Not only is this costly, but also the elevated quantum noise will become an issue if the incident X-ray intensity does not increase accordingly. Furthermore, low-dose CT has been actively pursued recently to lower the ionizing radiation to patients. A relatively smoother kernel is typically used in low-dose CT to suppress image noise, which may sacrifice spatial resolution. An alternative approach is to apply noise reduction methods to control image noise in low-dose CT. Many advanced algorithms were developed to alleviate the noise either in the projection domain, or in the image domain, or both. However, these algorithms may still suffer from degradation of spatial resolution, especially at low contrast level. It is desirable to develop algorithms that can simultaneously suppress image noise while enhancing spatial resolution (Super Resolution or SR).
There are three categories of computational methods proposed to suppress noise and enhance spatial resolution for LRCT images: 1) model based iterative reconstruction methods [3, 4]; 2) sparse representation methods [5–7]; and 3) deep learning methods[8–11]. In [11], a generative adversarial network (GAN) [12] with a perceptual loss function combining both an adversarial loss and a content loss, called “SRGAN”, has been proposed to improve natural image resolution, which was used as a comparative method for super-resolution CT (SRCT) in [10]. Recently, GAN-CIRCLE [10], a GAN using residual and cycle-consistent learning [13], was proposed to produce SRCT images. Extensive experiments have been conducted to show the SRCT performance of GAN-CIRCLE superior or comparable to other state-of-the-art SR methods, including SRGAN.
In this work, we propose a texture transformer network to simultaneously reduce image noise and improve spatial resolution in CT images. This network, referred to as Texture Transformer for Super Resolution (TTSR) [14], is a reference-based deep-learning image super-resolution method, which is another GAN-based deep learning method, to achieve SRCT from noisy LRCT. To our best knowledge, this is the first time to apply a transformer for SRCT, particularly for low-dose CT. The performance of TTSR outperforms the state-of-the-art methods, such as SRGAN and GAN-CIRCLE, for both simulated XCAT phantom data and Mayo low-dose CT dataset. We also show that TTSR can be used to allow the large detector size and fewer detectors for high-resolution CT scan and to save computation time.
II. METHODS
A. SRGAN
SRGAN [11] adapts the basic generator and discriminator structure of GAN to build the generator that maps LRCT to HRCT. The generator employs a residual network with skip-connection to force a perceptual loss of high-level feature maps from a VGG network in addition to the adversarial loss. SRGAN is a supervised method requiring the paired LRCT and HRCT images for training. It has shown a decent performance for SRCT [10] and more details can be found in [10, 11].
B. GAN-CIRCLE
The basic structure of GAN-CIRCLE [10] is composed of two GANs: one is to learn the forward mapping from LRCT to HRCT (generator G: x → y and discriminator Dy) and the other is to learn the backward mapping from HRCT to LRCT (generator F: y → x and discriminator Dx). The GAN-CIRCLE principles are shown in Fig. 1. In order to learn two mappings without paired LR and HR CT images, the cycle consistency loss is used to enforce F(G(x)) = x and G(F(y)) = y, as indicated by the dashed arrows in Fig. 1. Although CAN-CIRCLE still needs the general labels of LRCT and HRCT images for training, the one-to-one correspondence between them is not required. More details of GAN-CIRCLE can be found in [10, 15].
Fig. 1.
The principles of GAN-CIRCLE. G: generator from LRCT (x) to HRCT (y); F: generator from HRCT (y) to LRCT (x); Dy: discriminator for real or fake HRCT images; Dx discriminator for real or fake LRCT images. The dashed arrows enforce the cycle-consistency loss.
C. TTSR with attention mechanism
The TTSR [14] is a reference-based image super-resolution method through a GAN mechanism (the blue branch of the forward mapping from LRCT to HRCT in Fig. 1). A complex transformer network is used for the generator to translate the LRCT images to the HRCT images. The LRCT images and HRCT images are severed as queries and keys in a transformer, respectively. The image translation is optimized through deep neural network (DNN) texture extraction, relevance embedding, and attention-based texture transfer and synthesis to enable joint feature learning between LRCT and HRCT images. The generator of TTSR is shown in Fig. 2. Q, K and V are the texture features extracted from an up-sampled LR image, a sequentially down/up-sampled reference image, and an original reference image, respectively. F is the LR extracted features and is further fused with the transferred texture features T to generate the SR output. ↑ represents upsampling. ↓ represents downsampling. The discriminator is used to distinguish HRCT and SRCT. The detailed network structures of the generator and discriminator can be found in Appendix. The parameters of TTSR network are optimized using the loss function that is composed of three parts: 1) L1 difference between the HRCT and HRCT images; 2) the GAN loss; and 3) the perceptual loss of feature maps. More details of TTSR can be found in [14].
Fig. 2.
The generator of the texture transformer super resolution (TTSR) network. Q: texture features extracted from an up-sampled LRCT image; K: texture features extracted from a sequentially down/up-sampled reference HRCT image; V: texture features from an original HRCT reference image; F: features from a LRCT image; T: transferred texture features; ↑: upsampling; ↓: downsampling.
III. EXPERIMENT SETUP
To evaluate the super-resolution performance of TTSR, we performed both a phantom study by using the XCAT phantom, and a real patient data study by using a real clinical dataset from the 2016 NIH-AAPM-Mayo Clinic Low Dose CT Grand Challenge [16]. Cubic spline interpolation, SRGAN [11], and GAN-CIRCLE [10] were used for comparison.
A. Evaluation metrics
We use two metrics to quantitatively evaluate different methods: peak signal-to-noise ratio (PSNR) [17] and structural similarity index measure (SSIM) [18, 19]. PSNR and SSIM are defined below,
where x denotes the SRCT image, y is the corresponding HRCT image, MSE is mean squared error between x and y, MAXy denotes the maximum intensity value in the HRCT image y, μx and μy are the averages of the SRCT image x and the HRCT image y, , and σxy are the corresponding variance and covariance, and c1 and c2 are two constants to stabilize the division operation. Additionally, we calculate Feature Similarity (FSIM) index [20], which is a structure-based image quality assessment using the phase congruency (PC) and the image gradient magnitude (GM). The local quality maps, i.e. PCSRCT, PCHRCT and GMSRCT, GMHRCT, are used to calculated the similarity measures SPC and SGM. The combined similarity is defined as SL= SPC•SGM, where • is pixel-wise multiplication. Finally, the FSIM index is defined as follows:
where ΣΩ represents the sum over the image spatial domain Ω.
The PSNR measures the super-resolution performance by calculating overall intensity difference between the SRCT image (translated from the LRCT image) and the HRCT image. The SSIM measures the perceptual similarity between the SRCT and HRCT images while pooling all locations to be the same importance. The FSIM is devised to measure the low-level feature sets between SRCT and HRCT images. The higher PSNR, SSIM, and FSIM values, the better the SR performance.
B. Phantom data experiments
The 4D XCAT phantom program [21] based on 18 patients’ CT data was used to produce the projection data through the ray tracing algorithm [22, 23]. The photon noise at 1×105 and 2×104 per ray and the electron noise of 10 were added respectively for full dose and low dose projection [24]. The benchmark HRCT images (1×105 photon counts/ray) were reconstructed as 512×512 slices with 1 mm × 1 mm pixel size. For LRCT images, the photon counts were reduced to 20% of HRCT, i.e. lower radiation dose (2×104 photon counts/ray), and 128×128 slices with 4 mm × 4 mm pixel size were reconstructed. 17 patients’ data were used as training data, while the remaining patient’ data were used as test data. The TTSR model was trained on CUFED dataset (i.e. only nature images without CT images). The pre-trained model was further tuned with 10 epochs using the simulated CT images with learning rate of 1×10−4. The reference image to obtain the SRCT image for the test LRCT image can be: 1) a HRCT image totally different from the LRCT image (“random”), e.g. the different organ areas from the different patients (in the training set); 2) an HRCT image similar to the LRCT image (“partially aligned”), e.g. the similar organ area from different patients (in the training set); and 3) an HRCT image identical to the scene of the LRCT image (“fully aligned”), e.g. the same organ area from the same patient (in the test set), as shown in Fig. 3 (green lines for random, red lines for partially aligned, and blue lines for fully aligned). Note that the first two types of reference images are not from the test set and represent a realistic achievable performance and the third one is trivial (as the corresponding HRCT image exists), but provides the best benchmark. For comparison, the cubic spline interpolation, SRGAN, and GAN-CIRCLE were used to obtain SRCT images from LRCT images as well. SRGAN was tuned to the learning rate of 1×10−4, and GAN-CIRCLE was tuned using lambda of 10 for the consistency loss and learning rate of 1×10−4.
Fig. 3.
Three types of reference images of TTSR to obtain SRCT images for the test LRCT images: a) Green lines, random; b) Red lines, partially aligned; c) Blue lines, fully aligned, trivial in a real application. k: the number of patients in the training set; n: the number of patients in the test set (n=1 in this study).
C. Patient data experiments
The 2016 NIH-AAPM-Mayo Clinic Low Dose CT Grand Challenge dataset contains 10 anonymous patients’ full-dose and corresponding low-dose (1/4 of the full dose) CT projection data. First, we used the conjugate gradient least squares (CGLS) iterative method [25] to reconstruct LRCT and HRCT images. HRCT images were reconstructed from the full-dose projection data (736×64 detector matrix, detector size 1.2858 mm × 1.0947 mm per pixel, helical pitch 0.6) with the image size of 512×512. To obtain LRCT images, the low-dose projection data were binned from 736×64 to 184×64, i.e. 4x binning in the in-plane direction. Consequently, LRCT images were reconstructed with an image size of 128×128. As the low-dose projection data used only 1/4 of the full dose, the 4x binning was roughly achieved the similar noise level as the full dose. For GAN-CIRCLE and TTSR methods, 9 patients’ data were used as training data, while the remaining patient’ data were used as test data. To train TTSR for the patient data, we found that the pre-trained model using CUFED dataset (“pre-trained”) did not provide any faster convergence for good SRCT performance (see “Discussion”) in contrast to the phantom data. Thus, we randomized the network parameters of TTSR and trained it for the patient data from scratch with a learning rate of 8×10−5 and 150 epochs. For SRGAN, a learning rate of 5×10−4 and 200 epochs were used. For GAN-CICLE, a learning rate of 2×10−4 and 100 epochs were used. For the reference image of the test LRCT image of TTSR, we used the similar slice in the training HRCT images to represent a realistic performance. The evaluation of SR performance of different methods in this patient data study is shown in Fig. 4.
Fig. 4.
Evaluation of SRCT for the real patient data study. (CGLS: conjugate gradient least squares)
Finally, we conducted a study to compare the aforementioned TTSR method (that translated LRCT to HRCT) to denoising of high-resolution low-dose CT using block-matching and 3D filtering (BM3D) [26] and GAN-CIRCLE [10]. We use GAN-CIRCLE instead of SRGAN because GAN-CIRCLE has been investigated for low-dose CT denoising [15], while SRGAN has not. For denoising methods, the high-resolution 512×512 low-dose CT images were first reconstructed from low-dose high-resolution projections (736×64 projection matrix). Then, SRCT images from TTSR were compared with these high-resolution low-dose CT images denoised by BM3D GAN-CIRCLE. Note that both methods used the same radiation dose. However, TTSR has two advantages over high-resolution low-dose CT denoising: 1) the hardware requirement is less demanding, as 184 (for TTSR) vs 736 (for high-resolution low-dose CT denoising) in-slice detector elements; and 2) the reconstruction time of 128×128 images is much less than that of 512×512 images. Once TTSR and GAN-CIRCLE were trained, the times for applying the models were in the same order (see Table IV).
TABLE IV.
Time Consuming for Generating Testing Patient images (471 slices)
| Method | Time(second) |
|---|---|
|
| |
| CGLS of 128×128 with 4× projection binning | 163 |
| CGLS of 512×512 | 1396 |
| BM3D | 3499 |
| GAN-CIRCLE | 29 |
| TTSR | 80 |
V. RESULTS
A. Phantom data results
The SRCT images from different methods are shown in Fig. 5 along with the original HRCT image (Fig. 5a). Note that the original LRCT images (128×128) are too small to show here. The cubic spline interpolation method is one of the simplest SR methods. However, as shown in Fig. 5b, the interpolated SR image is blurry and suffers some noise. SRGAN (Fig. 5c) greatly improves the clarity of fine structures. GAN-CIRCLE (Fig. 5d) yields a SR image with better defined edges and suppressed noise than the interpolation method, but seems to still oversmooth the image. TTSR (Fig. 5e–g) shows more details than the other two methods although some artifacts seem to be present in SR images (e.g. in the heart). The difference among TTSR using different reference images is small. The magnified view of the red box in Fig. 5a is shown in Fig. 6. Both interpolation and GAN-CIRCLE are unable to recover the lung nodules clearly and cause blurred edges, while SRGAN and TTSR provides much better resolution recovery. Although SRGAN maintains the sharp edges well, it produces piece-wise smoothness and is not able to keep the small structures inside the lung as well as TTSR. Full-aligned TTSR preserves the nodule shape better than the other TTSR methods. The quantitative results averaged over 128 slices of the test patient are listed in Table I. TTSR achieves the best PSNR and SRGAN yields the best SSIM. Both TTSR and SRGAN ties for the best FSIM. As SRGAN and TTSR use paired training data, they seem to achieve better SRCT performance than GAN-CIRCLE, which is an un-paired training. Some artifacts are noticeable in TTSR images inside the heart (Fig. 5), which may have caused inferior SSIM of TTSR to that of SRGAN and GAN-CIRCLE. Nevertheless, these results suggest that previously acquired HRCT images can be used as reference images to improve newly acquired LRCT images with much reduced radiation dose without the requirement of alignments (for TTSR-random and TTSR-partially aligned).
Fig. 5.
HRCT image and SRCT images from different methods (a: HRCT; b: cubic spline interpolation; c: SRGAN; d: GAN-CIRCLE; e: TTSR random; f: TTSR partially aligned; g: TTSR fully aligned). (Display window: [−1345 782]HU)
Fig. 6.
HRCT image and SRCT images from different methods for the red box area in Fig. 5a. (a: HRCT; b: cubic spline interpolation; c: SRGAN; d: GAN-CIRCLE; e: TTSR random; f: TTSR partially aligned; g: TTSR fully aligned). (Display window: [−1345 150]HU)
TABLE I.
Quantitative Results for Different Methods for the phantom data
| Methods | PSNR (dB) | SSIM | FSIM |
|---|---|---|---|
|
| |||
| Cubic spline | 31.57±0.89 | 0.74±0.02 | 0.87±0.01 |
| SRGAN | 35.84±0.41 | 0.89±0.02 | 0.95±0.01 |
| GAN-CIRCLE | 34.85±0.32 | 0.88±0.02 | 0.94±0.01 |
| TTSR/random | 37.23±0.42 | 0.84±0.02 | 0.93±0.01 |
| TTSR/partially aligned | 37.23±0.42 | 0.84±0.02 | 0.93±0.01 |
| TTSR/fully aligned | 37.99±0.49 | 0.86±0.02 | 0.95±0.01 |
B. Patient data results
The images from different methods for real patient data are shown in Fig. 7 (for the thighs) and Fig. 8 (for the thorax). Again, the original LRCT images (128×128) are too small to show here. The interpolation (Fig. 7b, 8b) and GAN-CIRCLE (Fig. 7d, 8d) SR images are kind of blurry, although GAN-CIRCLE performs slightly better. SRGAN (Fig. 7c, 8c) suffers less blurry, but the SR images seem to show different patterns from the HRCT images, e.g. in the thighs (Fig. 7c) and in the lungs (Fig. 8c). Note that the TTSR method uses the similar HRCT slice in the training set as the reference image, corresponding to TTSR-partially aligned in the simulation study. The TTSR images (Fig. 7e, 8e) successfully recover image resolution without notable artifacts and yield most similar images to the original HRCT images (Fig. 7a, 8a). The quantitative results in Table II also show the superior SR performance of the TTSR model, followed by SRGAN, over other methods. As can be seen in the zoomed region of the part of the lung and heart (red box in Fig. 8a) in Fig. 9, GAN-CIRCLE (d) has less noise than the interpolation method (b), but is still lack of details. SRGAN generates a sharp SRCT image (c) at the cost of eliminating the fine structures in the lung and the texture in the heart. The TTSR method shows better structure details, e.g. red arrows, and better preserved edge, e.g. blue arrows. The TTSR image (e) resembles the HRCT image (a) most although the texture inside the heart in the TTSR image (with less streak-artifacts) seems to be different from and more visually appealing than the original HRCT. It suggests that TTSR may not benefit from the pre-trained network with totally unrelated data for better CT SR performance [27]. The quantitative results for the region shown in Fig. 9 are listed in Table III. The TTSR model beats all other methods with a large margin in all three metrics. The performance of GAN-CIRCLE becomes comparable to SRGAN as SRGAN suffers loss of a lot of details in this region full of fine structures and textures.
Fig. 7.
HRCT image and SRCT images of the legs from different methods (a: HRCT; b: cubic spline interpolation; c: SRGAN; d: GAN-CIRLCE; e: TTSR). (Display window: [−160 240]HU)
Fig. 8.
HRCT image and SRCT images of the chest from different methods (a: HRCT; b: cubic spline interpolation; c: SRGAN; d: GAN-CIRLCE e: TTSR). (Display window: [−1556, 1043]HU)
TABLE II.
Quantitative Results for Different Methods for the patient data
| Methods | PSNR (dB) | SSIM | FSIM |
|---|---|---|---|
| Cubic spline SR | 27.25±0.49 | 0.53±0.04 | 0.89±0.01 |
| SRGAN SR | 29.59±0.70 | 0.61±0.04 | 0.94±0.01 |
| GAN-CIRCLE SR | 27.75±0.41 | 0.54±0.05 | 0.89±0.01 |
| TTSR | 31.16±1.38 | 0.73±0.06 | 0.97±0.01 |
| BM3D denoising | 28.18±0.52 | 0.54±0.04 | 0.88±0.01 |
| GAN-CIRCLE denoising | 29.39±0.51 | 0.58±0.03 | 0.91±0.01 |
Fig. 9.
Zoomed-in HRCT image and SRCT images of the chest from different methods for the red box area in Fig. 8a (a: HRCT; b: cubic spline interpolation; c: SRGAN; d: GAN-CIRLCE e: TTSR). (Display window: [−1556, 1043]HU)
TABLE III.
Quantitative Results for Different Methods for the patient data in the Zoomed-in range
| Methods | PSNR (dB) | SSIM | FSIM |
|---|---|---|---|
|
| |||
| Cubic spline SR | 27.12 | 0.51 | 0.79 |
| SRGAN | 28.77 | 0.59 | 0.82 |
| GAN-CIRCLE denoising | 28.03 | 0.57 | 0.82 |
| TTSR | 31.39 | 0.81 | 0.98 |
To compare the denoised high-resolution low-dose CT images with TTSR, the corresponding images are shown in Fig. 10 and 11. The BM3D method largely suppresses the noise, but leads to over-smoothing as shown particularly in the zoomed-in view of Fig. 11b. Although GAN-CIRCLE successfully addresses the blurring issue, the apparency of denoised image is different from the original HRCT, e.g. the contrast and the edge of the heart as indicated by the red arrow in Fig. 11. Again, TTSR SRCT images are closest to the original image compared to the two denoising methods. The quantitative results for BM3D and GAN-CIRCLE denoising of low-dose CT were listed in the bottom two rows of Table III. Both denoising methods of high-resolution low-dose CT images are outperformed by TTSR, although GAN-CIRCLE denoising yields better PSNR, SSIM and FSIM values than GAN-CIRCLE SR.
Fig. 10.
Comparison of denoised high-resolution low-dose CT and TTSR SRCT images: a) HRCT (512×512 full dose); b) BM3D denoised low-dose CT; c) GAN-CIRCLE denoised LDCT; d) TTSR SRCT. (Display window: [−1556, 1043]HU)
Fig. 11.
Comparison of denoised high-resolution low-dose CT and TTSR SRCT images - zoomed-in views for the red box area in Fig. 8a: a) HRCT (512×512 full dose); b) BM3D denoised low-dose CT; c) GAN-CIRCLE denoised low-dose CT; d) TTSR SRCT. (Display window: [−1556, 1043]HU)
C. Computational efficiency
To compare the computational efficiency of different methods, we listed the computation time for CGLS reconstruction of a 3D image for one set of patient data (471 slices), denoising, and TTSR in Table IV. As can be seen from the Table IV, TTSR combined with low-resolution reconstruction (CGLS of 128×128 with 4x projection binning) is much less time consuming than the high-resolution low-dose CT denoising (CGLS of 512×512 + either BM3D or GAN-CIRCLE), yet provides much better image quality as shown in previous results.
V. DISCUSSION AND CONCLUSION
The training for TTSR is substantially slower than SRGAN and GAN-CIRCLE. It takes six hours for 10 epochs of TTSR training on an NVIDIA A6000 GPU card, while 16 minutes for SRGAN and two hours for GAN-CIRCLE for the same number of epochs. The high computational efficiency of SRGAN is due to its straightforward generator and discriminator structure. The complexity of TTSR is also much higher than that of SRGAN and GAN-CIRCLE. There are more than seven million network parameters for TTSR, while 1.6 million for SRGAN and less than 200 thousand for GAN-CIRCLE.
In this work, we mainly tuned the learning rate and the number of epochs for different models empirically for the best PSNR using the test data, while keeping other hyperparameters the same as previous publications. This is mainly due to the heavy computational cost for a systematic investigation of the optimal hyperparameter set and the main purpose of this work is to demonstrate the feasibility of TTSR for SRCT. For the phantom data, we found that the TTSR model pre-trained on the natural images could facilitate the training of SRCT (10 epochs). However, this is not the case for real patient data. The PSNR changes of the test patient data along the training epochs of TTSR trained from the pre-trained model and from scratch (learning rate = 8×10−5) are plotted in Fig. 12. As can be seen, the pre-trained model does not provide a faster convergence than the from-scratch model, but seems to be less stable for the later epochs. This is likely due to that the difference between real patient data and natural images is bigger than that between phantom data and natural images. We also compared the quantitative metrics and images for two TTSR models. The from-scratch model is slightly better than the pre-trained model.
Fig. 12.
PSNR of the test patient set for pre-trained and from-scratch TTSR. (Learning rate = 8×10−5)
We decided not to use a separated validation set as the datasets, particularly for the patient data, are relatively small. Although the performance may be over-estimated due to the involvement of the test data in hyperparameter tuning, the relative ranking of the different deep learning methods shall not change. Furthermore, due to the computational hurdle and the limited number of patients, the SR image translation was limited in 2D in this work. Although this may not be an issue for helical CT where 2D slices are reconstructed, the translation in 3D space could further utilize the features in the axial direction and lead to better performance.
To evaluate the SR performance, we used the engineering metrics, such as PSNR, SSIM and FSIM, as a preliminary demonstration of the effectiveness of TTSR. TTSR outperforms both SRGAN and GAN-CIRCLE for the patient data, especially for the regions with fine structures and textures (Table III). The results in Table II should be used in caution as large blank regions outside the body may favor SRGAN, which tends to produce over-cleaned images. It is also worth mentioning that GAN-CIRCLE used in this work is corresponding to un-supervised GAN-CIRCLE (“GAN-CIRCLEu”) in [10], i.e. no paired LRCT and HRCT were used in training. Our quantitative results (PSNR and SSIM) on the Mayo low-dose CT data seem not to be consistent with that in Table I of [10] (except for the PSNR relationship, i.e. SRGAN has a higher PSNR than GAN-CIRCLEu). This inconsistency may be due to several reasons: 1) the split of training and test data is not the same (unclear in [10]); 2) the super-resolution is different (1X in each dimension in [10] and 4X in this work); 3) the reconstruction methods are different (FBP in [10] and CGLS in this work); and 4) the implementation of SRGAN and GAN-CIRCLE may be different although we downloaded the original code and tuned the hyperparameters based on PSNR. Furthermore, the caveat should be kept in mind that these quantitative metrics may not reflect the real impact of image quality on the clinical decision making. In future studies, the task-based evaluation should be used for the relevant clinical task, such as liver lesion detection or artery plaque quantification etc.
TTSR has been developed originally for SR of natural images [14]. A detailed ablation study was conducted to show that all three loss functions together achieved the best visual results. In this work, we adapted the optimal structure suggested by the original TTSR and tuned the learning rate and the number of epochs for SRCT. In addition, we compared two other deep-learning based methods, SRGAN and GAN-CIRCLE, as they were extensively studied and compared for SRCT [10]. As our primary goal is to demonstrate that the current implementation of TTSR could achieve SRCT comparable to or better than other state-of-the-art methods, such as SRGAN and GAN-CIRCLE, we leave the detailed ablation study in future work, which may determine the effectiveness of different components of TTSR for improved computational efficiency and/or SRCT performance.
In summary, we proposed a TTSR method for low-dose CT super-resolution in this work. TTSR based on texture transformer and attention mechanism is effective to improve the spatial resolution and to suppress the noise of low-dose CT images for both phantom and patient data. The high-quality super-resolution CT images can be obtained through TTSR even with much reduced dose (1/4) and fewer projection data (1/4), which can lower ionizing radiation and computation time. This development could not only contribute to conventional CT super-resolution, but also improve image quality using less expensive CT detectors.
VI. Appendix – TTSR network structure
The generator of TTSR is shown in Table AI and the discriminator in AII. For more details, please refer to the supplementary material in [14].
Table AI.
The generator structure of TTSR
|
Table AII.
The discriminator structure of TTSR
|
Definition of fundamental blocks:
Conv2d: 2D convolution
ReLU: rectified linear unit function
MaxPool2d: 2D max pooling
MeanShift: Subtract channel-wise mean from the input
PixelShuffle: Rearranges elements
SearchTransfer: Sequence of unfold, permute, normalize, multiply, max, unfold, fold functions
ResBlock: Sequence of Conv2d and ReLU functions
LeakyReLU: A type of activation function based on a ReLU, which has a small slope for negative values instead of strict zero
Linear: Linear transformation
REFERENCES
- 1.Greenspan H, Super-Resolution in Medical Imaging. The Computer Journal, 2008. 52(1): p. 43–63. [Google Scholar]
- 2.Hassan A, Nazir SA, and Alkadhi H, Technical challenges of coronary CT angiography: Today and tomorrow. European Journal of Radiology, 2011. 79(2): p. 161–171. [DOI] [PubMed] [Google Scholar]
- 3.Smith EA, et al. , Model-based iterative reconstruction: effect on patient radiation dose and image quality in pediatric body CT. Radiology, 2014. 270(2): p. 526–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Yasaka K, et al. , High-resolution CT with new model-based iterative reconstruction with resolution preference algorithm in evaluations of lung nodules: Comparison with conventional model-based iterative reconstruction and adaptive statistical iterative reconstruction. European Journal of Radiology, 2016. 85(3): p. 599–606. [DOI] [PubMed] [Google Scholar]
- 5.Yang J, et al. , Image Super-Resolution Via Sparse Representation. IEEE Transactions on Image Processing, 2010. 19(11): p. 2861–2873. [DOI] [PubMed] [Google Scholar]
- 6.Dong W, et al. , Image Deblurring and Super-Resolution by Adaptive Sparse Domain Selection and Adaptive Regularization. IEEE Transactions on Image Processing, 2011. 20(7): p. 1838–1857. [DOI] [PubMed] [Google Scholar]
- 7.Zhang Y, et al. Reconstruction of super-resolution lung 4D-CT using patch-based sparse representation. in 2012 IEEE Conference on Computer Vision and Pattern Recognition. 2012. [Google Scholar]
- 8.Yu H, et al. Computed tomography super-resolution using convolutional neural networks. in 2017 IEEE International Conference on Image Processing (ICIP). 2017. [Google Scholar]
- 9.Park J, et al. , Computed tomography super-resolution using deep convolutional neural network. Physics in Medicine & Biology, 2018. 63(14): p. 145011. [DOI] [PubMed] [Google Scholar]
- 10.You C, et al. , CT Super-Resolution GAN Constrained by the Identical, Residual, and Cycle Learning Ensemble (GAN-CIRCLE). IEEE Transactions on Medical Imaging, 2020. 39(1): p. 188–203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Ledig C, et al. Photo-realistic single image super-resolution using a generative adversarial network. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. [Google Scholar]
- 12.Goodfellow IJ, et al. , Generative adversarial nets, in Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. 2014, MIT Press: Montreal, Canada. p. 2672–2680. [Google Scholar]
- 13.Isola P, et al. Image-to-Image Translation with Conditional Adversarial Networks. in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017. [Google Scholar]
- 14.Yang F, et al. Learning Texture Transformer Network for Image Super-Resolution. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. [Google Scholar]
- 15.Li Z, et al. , Investigation of Low-Dose CT Image Denoising Using Unpaired Deep Learning Methods. IEEE Transactions on Radiation and Plasma Medical Sciences, 2021. 5(2): p. 224–234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.McCollough C, TU-FG-207A-04: Overview of the Low Dose CT Grand Challenge. Medical Physics, 2016. 43(6Part35): p. 3759–3760. [Google Scholar]
- 17.Huynh-Thu Q and Ghanbari M, Scope of validity of PSNR in image/video quality assessment. Electronics letters, 2008. 44(13): p. 800–801. [Google Scholar]
- 18.Zhou W, et al. , Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 2004. 13(4): p. 600–612. [DOI] [PubMed] [Google Scholar]
- 19.Horé A and Ziou D. Image Quality Metrics: PSNR vs. SSIM. in 2010 20th International Conference on Pattern Recognition. 2010. [Google Scholar]
- 20.Zhang L, et al. , FSIM: A Feature Similarity Index for Image Quality Assessment. IEEE Transactions on Image Processing, 2011. 20(8): p. 2378–2386. [DOI] [PubMed] [Google Scholar]
- 21.Segars WP, et al. , 4D XCAT phantom for multimodality imaging research. Medical Physics, 2010. 37(9): p. 4902–4915. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Han G, Liang Z, and You J. A fast ray-tracing technique for TCT and ECT studies. in 1999 IEEE Nuclear Science Symposium. Conference Record. 1999 Nuclear Science Symposium and Medical Imaging Conference (Cat. No.99CH37019). 1999. [Google Scholar]
- 23.Zhou S, et al. , General simultaneous motion estimation and image reconstruction (G-SMEIR). Biomedical Physics & Engineering Express , 2021. 7(5): p. 055011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Wang J and Gu X, Simultaneous motion estimation and image reconstruction (SMEIR) for 4D cone-beam CT. Medical Physics, 2013. 40(10): p. 101912. [DOI] [PubMed] [Google Scholar]
- 25.Qiu W, Titley-Peloquin D, and Soleimani M, Blockwise conjugate gradient methods for image reconstruction in volumetric CT. Computer Methods and Programs in Biomedicine, 2012. 108(2): p. 669–678. [DOI] [PubMed] [Google Scholar]
- 26.Dabov K, et al. , Image denoising with block-matching and 3D filtering. Electronic Imaging 2006. Vol. 6064. 2006: SPIE. [Google Scholar]
- 27.Ginart A, et al. , Making ai forget you: Data deletion in machine learning. Advances in Neural Information Processing Systems, 2019. 32. [Google Scholar]












