Skip to main content
Biomedical Optics Express logoLink to Biomedical Optics Express
. 2022 Oct 13;13(11):5775–5793. doi: 10.1364/BOE.471340

ERA-WGAT: Edge-enhanced residual autoencoder with a window-based graph attention convolutional network for low-dose CT denoising

Han Liu 1, Peixi Liao 2, Hu Chen 1,*, Yi Zhang 1
PMCID: PMC9872905  PMID: 36733738

Abstract

Computed tomography (CT) has become a powerful tool for medical diagnosis. However, minimizing X-ray radiation risk for the patient poses significant challenges to obtain suitable low dose CT images. Although various low-dose CT methods using deep learning techniques have produced impressive results, convolutional neural network based methods focus more on local information and hence are very limited for non-local information extraction. This paper proposes ERA-WGAT, a residual autoencoder incorporating an edge enhancement module that performs convolution with eight types of learnable operators providing rich edge information and a window-based graph attention convolutional network that combines static and dynamic attention modules to explore non-local self-similarity. We use the compound loss function that combines MSE loss and multi-scale perceptual loss to mitigate the over-smoothing problem. Compared with current low-dose CT denoising methods, ERA-WGAT confirmed superior noise suppression and perceived image quality.

1. Introduction

X-ray computed tomography (CT) has been widely used for medical examination and diagnosis for humans. CT images can help achieve more effective medical management by determining when surgery is necessary, reducing exploratory surgeries, improving cancer diagnosis and treatment, etc. They also show surgeons exactly where to operate, greatly improving surgery success. Various body tissues absorb X-rays differently, hence CT images can provide vital internal information. However, excessive radiation exposure if patients CT imaged too many times can lead to potential health risks [1,2].

Many algorithms have been proposed to improve image quality for low-dose CT (LDCT), hence lessening radiation exposure for patients, with the underlying approaches categorized as (1) sinogram domain filtration, (2) iterative reconstruction (IR), and (3) image post-processing.

Sinogram filtering methods directly smooth raw data before image reconstruction, such as filtered back projection (FBP). [3] investigated a relatively accurate statistical model for sinogram data and developed a penalized likelihood method to smooth the sinogram. Other typical methods include structural adaptive filtering [4], bilateral filtering [5], and penalized weighted least-squares algorithms [6]. Sinogram filtering algorithms are often restricted in pratice due to difficultly accessing projection data.

Iterative reconstruction (IR) algorithms incrementally estimate denoised images using priors have been proposed for LDCT image denoising, including total variation (TV) and its variants [710], non-local mean [1113], dictionary learning [14,15], and other techniques [1618]. These IR algorithms can significantly improve image quality, but can lose details and have high computational cost, considerably limiting their practical application.

Post-processing can be applied to LDCT images. Previous studies have proposed classical image processing methods, including non-local means filtering [1921], dictionary learning [22,23], three-dimensional (3D) block matching (BM3D) [2426], and diffusion filters [27], which are more computationally efficient then IR methods. However, noise distributions in reconstructed LDCT images are commonly non-uniform, which makes it difficult to obtain valid denoising results.

Convolutional neural networks (CNN) have recently been shown to be highly effective for image denoising [2836]. Hence various network architectures have been proposed for LDCT denoising, including two-dimensional (2D) CNNs [28,29], 3D CNNs [30,33,35], residual connections [28,34], cascade CNN [31], dense connections [36], and quadratic convolution and deconvolution [34]; with many different objective functions, including mean squared error (MSE) [2831,34,36], adversarial loss [30,32,33,35], and perceptual loss [32,33,35,36]. We selected two pixels P and Q for visualization of non-local dependencies as shown in Fig. 1(a). Figure 1(b) and (c) are related pixels of pixel P and Q, respectively. It can be seen that the pixels of CT images are not only dependent on local pixels, but also dependent on non-local pixels. The main drawback for CNN-based methods is the lack of non-local self-similarity capturing due to the convolution kernels.

Fig. 1.

Fig. 1.

Non-local dependencies. (a) Two pixels selected for visualization of non-local dependencies. (b) Pixel P’s related pixels. (c) Pixel Q’s related pixels. The darker the color, the stronger the relationships.

Different from CNN-based methods, Graph convolutional networks (GCNs) have shown great potential exploring non-local self-similarity for LDCT image denoising [37,38]. However, these GCN-based methods only consider non-local self-similarity between pixels, which tends to provide unstable performance due to noisy pixels in LDCT images. Moreover, the behavior of the GCN in these pixel-based GCNs is different between training and testing for efficiency reasons, i.e., they use overlapped patches during training and all graphs are constructed in the patches, while during testing they define a search window which is roughly comparable to the patch size used in training for each pixel. However, this procedure is suboptimal as some pixels might suffer from border effects during training, i.e., their search windows are not centered around them.

Therefore, we propose ERA-WGAT, an edge-enhanced residual autoencoder with window-based graph attention convolutional network, to perform LDCT denoising, treating non-overlapped windows of fixed size in the feature maps as nodes rather pixels. The main contributions for this paper are as follows.

  • 1.

    We propose a conveying path based residual autoencoder and use eight types of learnable operators (vertical, horizontal, diagonal, and anti-diagonal Sobel operators; vertical and horizontal Scharr operators; two types of Laplacian operators) to extract edge information from the input LDCT image, and design an edge branch to provide sufficient edge information for each stage of the encoder part.

  • 2.

    We propose a window-based graph attention convolutional network (WGAT) combining static and dynamic attention modules to explore non-local self-similarity in the encoder, bottleneck, and decoder parts of the proposed model for LDCT images.

  • 3.

    In WGAT, we treated non-overlapped windows as nodes rather then pixels and adopted a hierarchical structure and perform WGAT on the feature maps with appropriate scale, which makes our model maintain the same behavior in the training and testing stages and solve the problem that some pixel may suffer from border effects in pixel-based GCN methods.

  • 4.

    Extensive experiments demonstrate that the proposed ERA-WGAT provides superior noise suppression and better image quality compared with several state-of-the-art denoising methods.

The remainder of this paper is organized as follows. Section 2. surveys the deep learning based noise suppression methods for LDCT. Section 3. describes the proposed method, which is evaluated and validated in Section 4. Section 5. summarizes and concludes the paper.

2. Related work

Table 1 summarizes a comprehensive comparison between the proposed ERA-WGAT and current deep learning based LDCT denoising methods. The main difference of models are divided into two aspects: network architecture and objective function.

Table 1. Comparison between Deep Learning based Methods. the Abbreviations Mse, Al, Pl in Table are for Mean Squared Error, Adversarial Loss, and Perceptual Loss, respectively.

Method Network Architecture Objective Function
Conv Deconv Shortcut Edge Enhancement GCN MSE AL PL
RED-CNN [28] Conv2d Deconv2d Residual + Skip
CNN [29] Conv2d
GAN-3D [30] Conv3d Skip
Cascade-CNN [31] Conv2d Cascade
WGAN-VGG [32] Conv2d
CPCE-2D [33] Conv2d Deconv2d Concatenation
CPCE-3D [33] Conv3d Deconv2d Concatenation
QAE [34] Q-Conv2d Q-Deconv2d Residual + Skip
SACNN [35] Conv3d Skip
EDCNN [36] Conv2d Densely connection Learnable (4 types)
CT-GCN [37] Conv2d Concatenation Fixed (2 types) Pixel-based
Chen et al. [38] Conv2d Deconv2d Skip Pixel-based
ERA-WGAT (Ours) Conv2d Residual + Concatenation Learnable (8 types) Window-based

2.1. Network architecture

The key elements for relevant network architectures include convolutional and deconvolutional layers, shortcut connections, Sobel operators, and GCNs.

2.1.1. Convolutional layers

Convolutional layers in deep learning networks can be broadly divided into 2D (Conv2d), quadratic 2D (Q-Conv2d) [34], and 3D (Conv3d) convolutions. Conv2d and Q-Conv2d are performed on a single CT slice, with Conv2d commonly employed whereas Q-Conv2d has been recently proposed to enhance individual neuron capabilities by replacing the inner product with a quadratic operation on input data. Conv3d is applied to several adjacent slices, incorporating 3D spatial information, typical methods include GAN-3D [30], CPCE-3D [33], and SACNN [35].

2.1.2. Deconvolutional layers

Autoencoder based methods often employ convolutional layers for the encoder, and deconvolutional layers for the decoder. However, deconvolution has uneven overlap when kernel size is not an integer multiple of the stride. But although selecting kernel size to be a multiple of the stride can help avoid overlap issues, it remains susceptible to create artifacts [39].

2.1.3. Shortcut connection

Shortcut connections are either skip connection (element-wise addition) or conveying path (concatenation). Skip connection bypasses non-linear transformations with an identity function, and conveying path reuses early feature maps as input for latter layers. Residual mapping, as proposed in RED-CNN [28], is also a skip connection, transforming direct mapping problems into residual mapping problems to helps avoid gradient vanishing problems and significantly enhance LDCT imaging performance. Dense connection, e.g., EDCNN [36], conveys edge enhancement module outputs to each convolution block. Cascade-CNN [31], cascades several CNNs, which are trained individually rather then forming a unified network. The current study employs residual mapping and conveying path based concatenation.

2.1.4. Edge enhancement

Edge enhancement modules (e.g. Sobel operators) applied to input CT images can help extract edge information [36,37]. CT-GCN [37] uses the Sobel edge extractor in horizontal and vertical directions; whereas EDCNN proposed the trainable Sobel convolution with four types of Sobel operators to extract edge information for the LDCT image, including vertical (Fig. 2(a)), horizontal (Fig. 2(b)), diagonal (Fig. 2(c)), and anti-diagonal (Fig. 2(d)) directions. On this basis, we add four operators to provide more powerful edge information extraction ability, including vertical Scharr (Fig. 2(e)), horizontal Scharr (Fig. 2(f)), and two Laplacian (Fig. 2(g) and (h)) operators. Compared with Sobel operators, Scharr operators have two changes: 1) increase the gap between pixels and can enlarge some edge details, 2) increase the weight in the cross directions and weaken the weight in the diagonal directions, that is, pay more attention to the influence of adjacent pixels. Laplacian operators are second-order differential operators while Sobel and Scharr operators are first-order differential ones. Laplacian operators can get more finer edge information and are sensitive noise. Therefore, we can rely on Laplacian operators to obtain the prior information (such as noise and edge details) where the pixel value of the LDCT image changes strongly. The learnable parameter λ defined in these trainable operators can be adaptively adjusted during training process so that it can provide better adaptability and generalization ability.

Fig. 2.

Fig. 2.

Eight types of trainable operators. (a), (b), (c), and (d) are vertical, horizontal, diagonal, and anti-diagonal Sobel operators, respectively. (e) and (f) are vertical and horizontal Scharr operators, respectively. (g) and (h) are two types of Laplacian operators.

2.1.5. GCN

CNNs can only explore LDCT image local information, whereas non-local self-similarity has been shown to be proved benefical for LDCT denoising. In contrast with CNNs, GCNs have been widely used to process non-Euclidian geometry data, and have proved effective on real image denoising [40]. Although several previous studies have employed GCNs for LDCT denoising [37,38], they still have two disadvantages. As discussed in Section 1, one is that these methods are pixel-based GCNs which tends to provide unstable performance due to noisy pixels. The other is that these methods may cause some pixels to suffer from border effects due to the different training and testing behaviors. The proposed ERA-WGAT uses a hierarchical structure and performs non-overlapping window based graph attention networks on encoder, bottleneck, and decoder parts to overcome the shortcomings of pixel-based methods.

2.2. Objective function

As a image transformation task, LDCT image denoising use various objective functions for optimization. Per-pixel loss, e.g., MSE loss, tend to generate over-smoothed images [28] because structural information is not considered. [41] propose perceptual loss (PL) to address this problem, calculating similarity between images in feature space; and several subsequently proposed GAN based methods combine PL with adversarial loss (AL) to generate more realistic images [32,33,35]. Although these approaches can retain structural information from the original images, they remain poor at suppressing noise. EDCNN [36] proposed compound loss, combining MSE and multi-scale perceptual loss to mitigate over-smoothing. Similar to EDCNN, we use the combination of MSE loss and multi-scale perceptual loss to optimize the proposed model.

3. Method

3.1. Overall architecture

Figure 3 shows that the proposed ERA-WGAT overall architecture comprises four main parts.

Fig. 3.

Fig. 3.

Overall architecture of our proposed network.

3.1.1. Edge enhancement module

We first perform four groups of convolutions with eight types of learnable operators described above to extract edge information from the input LDCT image, then, we use three 3×3 convolution with stride 2 followed by a Gaussian error linear unit (GELU [42]) to obtain feature maps of different scales, each convolution halves the size and doubles the channels of the feature map. These feature maps are concatenated with the corresponding feature maps from the encoder. In this way, we can provide sufficient edge information for the encoder.

3.1.2. Encoder

The encoder comprises five 3×3 convolution with stride 1 , three 3×3 convolutions with stride 2 , and a WGAT. Each convolution is followed by a GELU. The first 3×3 convolution with stride 1 is responsible for mapping the input LDCT image to the 32 -dimensional feature space. The other 3×3 convolutions with stride 1 perform nonlinear fusion of the feature maps corresponding to the edge enhancement module and encoder at different scales. Each 3×3 convolution with stride 2 halves the size and doubles the channels of the feature map. The WGAT is performed on the feature map generated by the fourth 3×3 convolution with stride 1 .

3.1.3. Bottleneck

The bottleneck consists of three 5×5 convolutions with stride 1 and a WGAT. Each convolution is followed by a GELU. Here we use 5×5 convolutions to increase the receptive field. And the WGAT is performed on the feature map generated by the second 5×5 convolution.

3.1.4. Decoder

The decoder comprises five 3×3 convolutions with stride 1 , three upsampling modules, and a WGAT. Each upsampling module consists of a bilinear upsampling layer that doubles the feature map size and a 3×3 convolution with stride 1 that halves the number of the feature map channels. Except for the last convolution, all other convolutions are followed by a GELU. The four 3×3 convolutions with stride 1 of blue arrow in decoder in Fig. 3 perform nonlinear fusion the feature maps corresponding to the encoder and decoder at different scales. This not only provides the decoder with early feature map information, but also provides rich edge information. We add residual compensation [28] to transform direct mapping into residual mapping problems, employing a 3×3 convolution to map each 32 -component feature vector for the feature map to 1 , and element-wise addition between the feature map and LDCT image. We also add a rectified linear unit (ReLU) to the resulting feature map after addition to obtain the restored image.

The following sections describe the WGAT and objective function.

3.2. Window based graph attention convolutional network

Figure 4 shows the proposed WGAT framework. We first construct a graph for the non-overlapped windows of the feature map, then pass the graph to the proposed WGAT to explore non-local self-similarity. This section describes WGAT and related components, including graph construction, input, WGAT, and readout layers.

Fig. 4.

Fig. 4.

The WGAT framework.

3.2.1. Graph construction

The input feature map is firstly extracted to non-overlapped windows, and then each extracted window is treated as a node. We use Euclidian distance to find k nearest neighbors for each node, and the constructed graph is send to WGAT to explore non-local self-similarity for all windows.

Formally, let C , H , and W represent input feature map channels, height, and width, respectively. The feature map is extracted to a set of non-overlapped windows FRN×C×M×M with window size M×M , N=HM×WM , comprising N windows FiRC×M×M . It should be noted that LDCT images are generally equal in height and width, so the width and height of the windows are also set to be equal. We construct a k -connected directed graph G for graph attention convolution. The edge weight is computed as

ωji=exp(dist(i,j)σ2), (1)

where i,jN , and dist(i,j) is the feature distance between nodes i and j ; and feature distance dist(i,j) is calculated using the 2 -norm distance between two corresponding patches Fi and Fj ,

dist(i,j)=FiFj2. (2)

Using (1) to compute ωji means that edge weights are always non-negative. Thus, the scale parameter σ is defined as the average distance for the k nearest neighbors for each node.

3.2.2. Input layer

For a given graph, we have node feature FiRC×M×M for each node i and edge weight ωjiR1 . Input features Fi is embedded to hidden features Si=0 and Di=0 , and ωji is embedded to hidden feature eji using a simple assignment operation before passing them to a graph convolutional network:

Si0=Fi;Di0=Fi;eji=ωji, (3)

where we omit eji superscripts, since it remains constant once the graph is constructed, and is used in the static attention module in the WGAT layer.

3.2.3. WGAT layer

Graph convolution updates the current node hidden state by aggregating information from its neighbor nodes. The key point is to efficiently aggregate the information. [43] propose to use attention mechanism to perform aggregation. The basic concept is to assign an attention weight or importance to each neighbor, which is then used to weigh the neighbor’s influence during aggregation. We combine static and dynamic attention modules to improve performance. Static attention uses edge hidden embedding eji as the attention weight, which remains constant once the graph is constructed (see above). We extend the original dynamic attention definition [43] for window aggregation, generating the dynamic attention by self-attention on the nodes at each WGAT layer. Hidden states for the nodes are updated after each WGAT layer, and the attention weight changes dynamically.

Formally, let Si,DiRC×M×M denote the static and dynamic hidden states at WGAT layer associated with node i . Next, let’s introduce the static and dynamic attention modules in detail.

Static attention. We use the constant edge weight eji to calculated the static attention module output Si+1RC×M×M :

Si+1=MaxjNi(Ws(ejiSj)+bs), (4)

where Ni represents the set of neighbor nodes of node i , WsRP2×C×C and bsRC denote the weight and bias, respectively. represents the convolution operator. P2 is the convolution kernel size, and we simply set P=3 .

Dynamic attention. The dynamic attention module output Di+1RC×M×M is calculated as:

Di+1=SumjNi(αji(WdDj+bd)), (5)

where WdRP2×C×C and bdRC denote the weight and bias, respectively. αji is the attention coefficient defined as:

αji=exp(αˆji)tNiexp(αˆti), (6)
αˆji=LR(AvgPool(Wf(Concat(WdDi+bd,WdDj+bd))+bf)), (7)

where LR, WfRPf2×1×2C , and bfR1 represent LeakyReLU, the weight and bias of the convolution that fuses the transformed hidden features of node i and its neighbor node j , respectively. Pf2 is the convolution kernel size of the convolution which use kernel size 2×2 and stride 2 to downsample the concatenated feature maps. We use two symmetric aggregation functions (4) and (5), i.e., invariant to input permutations, Max and Sum aggregators, although any symmetric function would suffice.

The WGAT explores the feature map non-local self-similarity and the two attention modules can better aggregate information from neighbor nodes. The number of neighbors for graph construction and the number of WGAT layers are two hyperparameters, and we empirically set number of neighbors =8 and number of WGAT layers =2 for all experiments.

3.2.4. Readout layer

The final WGAT component is the readout layer. We first reorganize all the windows into the input feature map format to obtain the static attention module output SRC×H×W and dynamic attention module output DRC×H×W . The output of the WGAT module URC×H×W is calculated as:

U=W(Concat(S,D))+b, (8)

where WRP2×C×2C , and bR1 represent the weight and bias of the convolution.

More importantly, we propose a fusion strategy of local and non-local feature maps. First, the local feature map (the input feature map) and non-local feature map (the output feature map of WGAT) are concatenated and then passed through a 3×3 convolution followed by a GELU. This strategy can effectively combine local and non-local information to improve the feature extraction ability of the network.

3.3. Objective function

Previous CNN based methods usually use MSE loss to train the network, i.e., minimize pixel-wise error between denoised and NDCT images. However, MSE loss can generate blurry images and cause detail distortion or loss. Hence, the perceptual loss is proposed to address this problem, but methods that use only perceptual loss perform weak in noise suppression. Therefore, we leverage the compound loss which combines the MSE loss and multi-scale perceptual loss to optimize the proposed ERA-WGAT.

3.3.1. MSE loss

The MSE loss function Lmse can be expressed as:

Lmse=1Ni=1Nϕ(Xi)Yi2, (9)

where Xi , Yi , N , and ϕ represent the LDCT image, NDCT image, number of image pairs, and the proposed ERA-WGAT, respectively.

3.3.2. Multi-scale perceptual loss

The multi-scale perceptual loss Lper is computed as:

Lper=1NSi=1Ns=1Sγs(ϕ(Xi))γs(Yi)2, (10)

where γ is the pretrained VGG19 [44]. S is the number of scales. In the experiments, η takes the fourth, eighth, 12 th, and 16 th convolutions in the VGG network. We duplicated the CT images to create RGB channels, since γ takes color image inputs, whereas CT images are greyscale.

3.3.3. Compound loss

The compound loss Lcomp can be expressed as:

Lcomp=Lmse+αLper, (11)

where α is a hyperparameter to balance the two components.

4. Experimental design and results

This section details the datasets used to train and evaluate the networks, discusses hyperparameter selection and ablation experiments on different modules, and compares the proposed ERA-WGAT approach with recent state-of-the-art methods denoising CT images.

4.1. Data sources

We used a real clinical dataset from the 2016 NIH AAPM-Mayo Clinic Low-Dose CT Grand Challenge [45] by Mayo Clinic for the training and evaluation of the proposed ERA-WGAT. The dataset contains 10 anonymous patients’ normal-dose abdominal CT images and simulated quarter-dose CT images. All the networks were trained with a subset of full and quarter dose image pairs (4,736 images from 8 patients), and tested with the remaining pairs (896 images from 2 patients).

4.2. Parameter selection

We experimentally evaluated several parameter combinations and finalized parameter settings as follows. We use the original resolution of the images, i.e., 512×512 . Networks were trained for 60 epochs; with base learning rate = 104 , multiplied by 0.1 at the 20th and 40th epoch. Number of WGAT layers = 2, with window size in encoder and decoder = 8×8 , window size in bottleneck = 4×4 . Number of neighbors for graph construction = 8. We used the AdamW [46] algorithm with weight decay =103 to optimize the network. All networks were implemented in PyTorch on two NVIDIA RTX 2080Ti GPU.

We employed four metrics to quantitatively evaluate network performance: root mean square error (RMSE), peak signal to noise ratio (PSNR), structural similarity (SSIM), and VGG-P. RMSE and PSNR focus on pixel-level similarity, SSIM on structural similarity, and VGG-P is the commonly used perceptual loss based on VGG19.

In order to determine the weighting parameter α for the compound loss, we select α from {0,101,102,103,104,105} . Table 2 summarizes the quantitative results associated with different value of α for the images in the test set. α=104 achieved the result of trade-off on all the metrics, therefore we set α=104 in the following experiments.

Table 2. Quantitative Results (Mean ± Sd) Associated with Different Value of Parameter α in the Compound Loss in the Proposed Era-wgat for the Images in the Test Set.

α RMSE ↓ PSNR ↑ SSIM ↑ VGG-P ↓
101 0.0073 ± 0.0011 42.8483 ± 1.2752 0.9620 ± 0.0125 0.0342 ± 0.0150
102 0.0070 ± 0.0011 43.1778 ± 1.2578 0.9647 ± 0.0110 0.0330 ± 0.0157
103 0.0066 ± 0.0010 43.6752 ± 1.2480 0.9682 ± 0.0102 0.0335 ± 0.0160
104 0.0063 ± 0.0010 44.0886 ± 1.2419 0.9712 ± 0.0093 0.0619 ± 0.0300
105 0.0063 ± 0.0009 44.1520 ± 1.2352 0.9717 ± 0.0092 0.1201 ± 0.0604
0 0.0063 ± 0.0009 44.1549 ± 1.2377 0.9717 ± 0.0092 0.1346 ± 0.0701

For comparison with the proposed ERA-WGAT, we employed BM3D [24], RED-CNN [28], WGAN-VGG [32], CPCE-2D [33], EDCNN [36], QAE [34], and CT-GCN [37] models.

4.3. Experimental results

4.3.1. Denoising performance comparison

Two representative patients’ scans from the test set were chosen to visualize the denoising performance. Figure 5 and Fig. 7 show the visualization results on the two slices, respectively. Figure 5(a) and Fig. 7(a) are NDCTs, Fig. 5(b) and Fig. 7(b) are LDCTs. Figure 5(c) and Fig. 7(c) are the denoising results of BM3D, a classical image processing method. Figure 5(d), (h), (i) and Fig. 7(d), (h), (i) show the denoising results of methods using only MSE loss, including RED-CNN, QAE, and CT-GCN. Figure 5(e), (f) and Fig. 7(e), (f) show the denoising results of models with GAN framework using perceptual loss, including WGAN-VGG and CPCE-2D. Figure 5(g), (j) and Fig. 7(g), (j) show the denoising results of models using the compound loss which combines MSE loss and perceptual loss, including EDCNN and our proposed ERA-WGAT. For a clearer comparison, we selected two enlarged region of interests (RoIs) marked by the blue rectangles in Fig. 5 and Fig. 7, and we also selected a complex structure marked by yellow rectangle in Fig. 7.

Fig. 5.

Fig. 5.

Results from the abdominal image with a metastasis in the liver for comparison. The Region of Interest(RoI) in the blue box is selected and magnified for a clearer comparison.

Fig. 7.

Fig. 7.

Results from the abdominal image with a metastasis in the liver for comparison. The Region of Interests(RoIs) in the blue box and yellow box are selected and magnified for a clearer comparison.

BM3D blurred the low-contrast lesions marked by red circles and caused many artifacts in the RoIs marked by blue rectangles in Fig. 5(c) and Fig. 7(c). It can be seen that RED-CNN had good denoising performance from its enlarged RoIs marked by blue rectangles in Fig. 5(d) and Fig. 7(d), but RED-CNN produced over-smoothed results (the contrast reduction of the image) and lost some texture structure information (some vessels pointed out by the orange arrows in Fig. 5(d) and Fig. 7(d)) compared with the NDCT images. The reason is that MSE loss is focused on minimizing the pixel-level average loss and often causes over-smoothing problem. Another two methods using MSE loss are QAE and CT-GCN which retained more structural details by using new network architectures (QAE uses the quadratic convolution, CT-GCN uses the graph convolution network), but they still generated over-smoothed results. WGAN-VGG and CPCE-2D use VGG perceptual loss to ease this problem, but they are poor in suppressing noise as shown in Fig. 5(e), (f) and Fig. 7(e), (f). EDCNN proposed to use the compound loss which combines the MSE loss and perceptual loss to reach a balance of noise reduction and structure preservation, as shown in Fig. 5(g) and Fig. 7(g). The noise removal ability of EDCNN is better than that of WGAN-VGG and CPCE-2D, however, EDCNN still had shortcomings in detail preservation (vessels pointed out by the orange arrow in Fig. 7(g)). At the positions pointed by the green arrows in Fig. 5(g) and Fig. 7(g), the vessels look a little vague and some are hardly identifiable. It is clear that our proposed ERA-WGAT obtained the best performance in terms of both noise suppression and structure preservation as shown in Fig. 5(j) and Fig. 7(j). At the positions pointed by orange and green arrows, ERA-WGAT performed better than the other methods in both vessel preservation and vessel brightness maintenance. Moreover, from the enlarged RoI marked by yellow rectangle in Fig. 7, ERA-WGAT gave the best performance of structure preservation over the other methods. The use of the compound loss helped ERA-WGAT avoid over-smoothing problem. ERA-WGAT shows a strong detail and structural information retention ability due to the rich edge information provided by the edge enhancement module and the non-local information provided by the window-based graph attention convolutional network (WGAT). Table 3 summarizes the quantitative results from the afore-mentioned two images. ERA-WGAT achieved better performance in terms of most of the metrics than the other methods.

Table 3. Quantitative Results Associated with Different Algorithms for Fig. 5 and 7.
Fig. 5
Fig. 7
RMSE ↓ PSNR ↑ SSIM ↑ VGG-P ↓ RMSE ↓ PSNR ↑ SSIM ↑ VGG-P ↓
LDCT 0.0143 36.8973 0.8624 0.6772 0.0171 35.3317 0.8296 0.5979
BM3D 0.0083 41.6485 0.9387 0.3975 0.0105 39.5418 0.9153 0.2810
RED-CNN 0.0074 42.6343 0.9584 0.2973 0.0092 40.7403 0.9418 0.2236
WGAN-VGG 0.0100 39.9989 0.9273 0.1495 0.0125 38.0954 0.9004 0.1628
CPCE-2D 0.0098 40.2085 0.9372 0.1079 0.0121 38.3184 0.9098 0.1455
EDCNN 0.0080 41.8843 0.9514 0.1039 0.0100 39.9913 0.9322 0.1046
QAE 0.0077 42.2746 0.9560 0.3094 0.0097 40.2299 0.9350 0.2291
CT-GCN 0.0076 42.3923 0.9567 0.3127 0.0095 40.4406 0.9384 0.2237
ERA-WGAT 0.0073 42.7222 0.9599 0.1029 0.0090 40.8842 0.9446 0.1377

To further show the merits of the porposed ERA-WGAT, we provided the absolute difference images relative to the NDCT image of Fig. 5 and 7 in Fig. 6 and 8, respectively. It is clear that ERA-WGAT yielded the smallest difference from the NDCT image.

Fig. 6.

Fig. 6.

Absolute difference images relative to the NDCT image of Fig. 5.

Fig. 8.

Fig. 8.

Absolute difference images relative to the NDCT image of Fig. 7.

4.3.2. Quantitative results

Table 4 summarizes the comparison results associated with different algorithms tested on the test set. Our proposed ERA-WGAT gave better performance in terms of most of the metrics than the other methods. It should be noted that since the loss functions used by the methods involved in the comparison are different, the results in Table 4 may not fairly show the performance of the methods. Therefore, in order to fairly compare the performance of these methods, we retrained them using the same loss functions (since WGA-VGG and CPCE-2D are GAN-based methods, we only involved their generators in the comparison). Table 5 and Table 6 summarizes the quantitative results associated with different algorithms retrained using MSE loss and the compound loss, respectively. It is exciting that ERA-WGAT achieved the best results on all the metrics.

Table 4. Quantitative Results (Mean ± Sd) Associated with Different Algorithms for the Images in the Test Set.
RMSE ↓ PSNR ↑ SSIM ↑ VGG-P ↓
LDCT 0.0117±0.0022 38.8083±1.5995 0.9054±0.0311 0.3376±0.1595
BM3D 0.0075±0.0010 42.6105±1.0930 0.9553±0.0135 0.2409±0.0735
RED-CNN 0.0064±0.0009 43.9140±1.1936 0.9702±0.0096 0.1416±0.0672
WGAN-VGG 0.0090±0.0011 40.9629±1.0069 0.9482±0.0166 0.0750±0.0342
CPCE-2D 0.0087±0.0012 41.2704±1.1064 0.9547±0.0150 0.0621±0.0309
EDCNN 0.0070±0.0010 43.2371±1.2320 0.9652±0.0110 0.0590 ± 0.0232
QAE 0.0068±0.0010 43.4866±1.1837 0.9676±0.0103 0.1641±0.0628
CT-GCN 0.0067±0.0009 43.5646±1.1426 0.9682±0.0098 0.1585±0.0621
ERA-WGAT 0.0063 ± 0.0010 44.0886 ± 1.2419 0.9712 ± 0.0093 0.0619±0.0300
Table 5. Quantitative Results (Mean ± Sd) Associated with Different Algorithms using Mean Squared Error Loss for the Images in the Test Set.
RMSE ↓ PSNR ↑ SSIM ↑ VGG-P ↓
LDCT 0.0117±0.0022 38.8083±1.5995 0.9054±0.0311 0.3376±0.1595
RED-CNN 0.0064±0.0009 43.9140±1.1936 0.9702±0.0096 0.1416±0.0672
WGAN-VGG 0.0068±0.0010 43.4073±1.1547 0.9675±0.0101 0.1653±0.0625
CPCE-2D 0.0068±0.0010 43.3890±1.1464 0.9674±0.0100 0.1660±0.0596
EDCNN 0.0066±0.0009 43.6518±1.1653 0.9685±0.0099 0.1465±0.0578
QAE 0.0068±0.0010 43.4866±1.1837 0.9676±0.0103 0.1641±0.0628
CT-GCN 0.0067±0.0009 43.5646±1.1426 0.9682±0.0098 0.1585±0.0621
ERA-WGAT 0.0063 ± 0.0009 44.1549 ± 1.2377 0.9717 ± 0.0092 0.1346 ± 0.0701
Table 6. Quantitative Results (Mean ± Sd) Associated with Different Algorithms using the Compound Loss for the Images in the Test Set.
RMSE ↓ PSNR ↑ SSIM ↑ VGG-P ↓
LDCT 0.0117±0.0022 38.8083±1.5995 0.9054±0.0311 0.3376±0.1595
RED-CNN 0.0064±0.0009 43.8987±1.2002 0.9701±0.0094 0.0627±0.0281
WGAN-VGG 0.0069±0.0010 43.2901±1.2066 0.9666±0.0109 0.0805±0.0320
CPCE-2D 0.0069±0.0010 43.3111±1.2079 0.9665±0.0108 0.0838±0.0307
EDCNN 0.0067±0.0010 43.5123±1.1913 0.9673±0.0103 0.0686±0.0267
QAE 0.0068±0.0010 43.4221±1.1803 0.9666±0.0103 0.0777±0.0289
CT-GCN 0.0068±0.0010 43.4128±1.2013 0.9668±0.0106 0.0676±0.0257
ERA-WGAT 0.0063 ± 0.0010 44.0886 ± 1.2419 0.9712 ± 0.0093 0.0619 ± 0.0300

4.3.3. Ablation study of proposed method

This section investigates ERA-WGAT performance under different model structure configurations. We designed a residual autoencoder (RA), removing edge enhancement module and WGAT from the structure presented in Fig. 3, and then separately added edge enhancement module (ERA), WGAT with static attention module ( RA-WGAT(sa) ), and WGAT with dynamic attention module ( RA-WGAT(da) ). We also added WGAT with static ( ERA-WGAT(sa) ) and static and dynamic ( ERA-WGAT(sa+da) ) attention modules into ERA. All models were trained using the compound loss with the same training strategy and datasets as used previously. Table 7 summarizes the quantitative results for various models. The complete ERA-WGAT model ( ERA-WGAT(sa+da) ) achieved the best results on all the metrics, with each additional component providing significantly performance improvement. Figure 9 shows the compound loss value on the testing dataset during training under different model structure configurations. It is that the compound loss value increases continuously by adding edge enhancement module, WGAT with static and dynamic attention modules.

Table 7. Quantitative Results (Mean ± Sd) of Ablation Experiments on Differnet Modules in our method for the images in the test set. the Abbreviations E, Sa, Da in Table are for Edge Enhancement Module, Static Attention Module, and Dynamic Attention Module, respectively.
E SA DA RMSE ↓ PSNR ↑ SSIM ↑ VGG-P ↓ Params ↓ FLOPs ↓
RA 0.0066±0.0009 43.6962±1.1380 0.9696±0.0093 0.0801±0.0307 7.26M 58.05G
ERA 0.0064±0.0010 44.0277±1.2388 0.9710±0.0094 0.0642±0.0314 9.21M 81.01G
RA-WGAT(sa) 0.0064±0.0009 43.9249±1.1933 0.9705±0.0093 0.0719±0.0304 10.80M 188.51G
RA-WGAT(da) 0.0065±0.0009 43.8108±1.1604 0.9700±0.0092 0.0770±0.0308 10.82M 189.27G
ERA-WGAT(sa) 0.0063±0.0010 44.0774±1.2386 0.9712±0.0093 0.0622±0.0300 12.75M 211.46G
ERA-WGAT(sa+da) 0.0063±0.0010 44.0886 ± 1.2419 0.9712±0.0093 0.0619 ± 0.0300 16.30M 240.62G
Fig. 9.

Fig. 9.

Compound loss value on the testing dataset during training under different model structure configurations.

Table 8 summarizes the quantitative results of ablation experiments on the proposed edge branch in our method for the image in the test set, it is clear that the proposed edge branch can improve the performance of the model on all metrics, which is due to the fact that the edge branch can deliver rich edge information to all parts of the encoder.

Table 8. Quantitative Results (Mean ± Sd) of Ablation Experiments on Edge Branch in our Method for the Images in the Test Set.
Edge Branch RMSE ↓ PSNR ↑ SSIM ↑ VGG-P ↓
0.0064±0.0009 44.0326±1.2314 0.9710±0.0094 0.0649±0.0318
0.0063 ± 0.0010 44.0886 ± 1.2419 0.9712 ± 0.0093 0.0619 ± 0.0300

Table 9 summarizes the quantitative results of different number of WGAT layers in our method for the images in the test set, which shows no significant difference on RMSE, SSIM, and VGGP. The proposed ERA-WGAT achieved the best performance on PSNR, Params, and FLOPs when the number of WGAT layers is 2 . As the number of WGAT layers increases, the Params and FLOPs increases. The basic concept bebind WGAT is that every node aggregates information from its neighborhood, and node embedding contains increasing information from further reaches of the graph as the WGAT layers progress, i.e., every node embedding contains information about its L -hop neighborhood after L WGAT layers. When the number of WGAT layer is 2, we can already obtain very rich non-local information. If we increase the number of WGAT layers, not only the Params and FLOPs will increase, but also the essence is to smooth the information of all nodes. So we set the number of WGAT layers to 2 for better performance.

Table 9. Quantitative Results (Mean ± Sd) of Different Number of WGAT Layers in our Method for the Images in the Test Set.
WGAT layers RMSE ↓ PSNR ↑ SSIM ↑ VGG-P ↓ Params ↓ FLOPs ↓
2 0.0063±0.0010 44.0886 ± 1.2419 0.9712±0.0093 0.0619±0.0300 16.30M 240.62G
3 0.0063±0.0010 44.0808±1.2403 0.9712±0.0093 0.0620±0.0298 18.08M 305.94G
4 0.0063±0.0010 44.0761±1.2401 0.9711±0.0093 0.0614 ± 0.0292 19.85M 371.25G
5 0.0063±0.0010 44.0699±1.2389 0.9711±0.0094 0.0618±0.0298 21.63M 436.57G
6 0.0063±0.0010 44.0760±1.2400 0.9711±0.0093 0.0616±0.0296 23.40M 501.88G

5. Conclusion

We proposed ERA-WGAT, a residual autoencoder incorporating edge enhancement module providing edge information and window-based graph attention convolutional network (WGAT) exploring non-local self-similarity. From the results of denoising performance and quantitative metrics, ERA-WGAT achieved superior performance compared with current state-of-the-art methods. The ablation experiments have proved the effectiveness of the proposed components, including edge enhancement, edge branch, and WGAT with static and dynamic attention. In the experimental results, we found that the FLOPs of the proposed WGAT is a little large, which was mainly caused by the graph construction part. As we mentioned earlier, the previous GCN methods use patches in training to try to solve the problem of large calculation amount, which led to inconsistent behavior between training and testing, because GCN does not have the translation invariance, which will cause boundary effects in the denoising results. The traditional GCN methods still have a huge amound of computation because this kind of methods treats each pixel as a node, and the number of nodes in the graph is larger than our proposed WGAT. Although we have solved the boundary effect problem and reduced the number of nodes in the graph, the FLOPs of our method is still larger than that of CNN methods. We do not shy away from this problem, but take it as one of the directions fro feature improvement. Future study will also extend the WGAT module to 3D to explore non-local information between adjacent slices. We will also consider extracting non-local information of different scales to further improve the denoising performance.

Funding

National Natural Science Foundation of China10.13039/501100001809 (61871277);Sichuan Province Science and Technology Support Program10.13039/100012542 (2022JDJQ0045, 2021JDJQ0024, 2019YFH0193); Chengdu Science and Technology Program10.13039/501100019014 (2018YF0500069SN).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are available in [45].

References

  • 1.Smith-Bindman R., Lipson J., Marcus R., Kim K.-P., Mahesh M., Gould R., De González A. B., Miglioretti D. L., “Radiation dose associated with common computed tomography examinations and the associated lifetime attributable risk of cancer,” Arch. Intern. Med. 169(22), 2078–2086 (2009). 10.1001/archinternmed.2009.427 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.De González A. B., Mahesh M., Kim K.-P., Bhargavan M., Lewis R., Mettler F., Land C., “Projected cancer risks from computed tomographic scans performed in the united states in 2007,” Arch. Intern. Med. 169(22), 2071–2077 (2009). 10.1001/archinternmed.2009.440 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Li T., Li X., Wang J., Wen J., Lu H., Hsieh J., Liang Z., “Nonlinear sinogram smoothing for low-dose x-ray ct,” IEEE Trans. Nucl. Sci. 51(5), 2505–2513 (2004). 10.1109/TNS.2004.834824 [DOI] [Google Scholar]
  • 4.Balda M., Hornegger J., Heismann B., “Ray contribution masks for structure adaptive sinogram filtering,” IEEE Trans. Med. Imaging 31(6), 1228–1239 (2012). 10.1109/TMI.2012.2187213 [DOI] [PubMed] [Google Scholar]
  • 5.Manduca A., Yu L., Trzasko J. D., Khaylova N., Kofler J. M., McCollough C. M., Fletcher J. G., “Projection space denoising with bilateral filtering and ct noise modeling for dose reduction in ct,” Med. Phys. 36(11), 4911–4919 (2009). 10.1118/1.3232004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Wang J., Li T., Lu H., Liang Z., “Penalized weighted least-squares approach to sinogram noise reduction and image reconstruction for low-dose x-ray computed tomography,” IEEE Trans. Med. Imaging 25(10), 1272–1283 (2006). 10.1109/TMI.2006.882141 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Sidky E. Y., Pan X., “Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization,” Phys. Med. Biol. 53(17), 4777–4807 (2008). 10.1088/0031-9155/53/17/021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Zhang Y., Zhang W.-H., Chen H., Yang M.-L., Li T.-Y., Zhou J.-L., “Few-view image reconstruction combining total variation and a high-order norm,” Int. J. Imaging Syst. Technol. 23, 249–255 (2013). 10.1002/ima.22058 [DOI] [Google Scholar]
  • 9.Zhang Y., Zhang W., Lei Y., Zhou J., “Few-view image reconstruction with fractional-order total variation,” J. Opt. Soc. Am. A 31(5), 981–995 (2014). 10.1364/JOSAA.31.000981 [DOI] [PubMed] [Google Scholar]
  • 10.Zhang Y., Wang Y., Zhang W., Lin F., Pu Y., Zhou J., “Statistical iterative reconstruction using adaptive fractional order regularization,” Biomed. Opt. Express 7(3), 1015–1029 (2016). 10.1364/BOE.7.001015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Chen Y., Gao D., Nie C., Luo L., Chen W., Yin X., Lin Y., “Bayesian statistical reconstruction for low-dose x-ray computed tomography using an adaptive-weighting nonlocal prior,” Comput. Med. Imaging Graph. 33(7), 495–500 (2009). 10.1016/j.compmedimag.2008.12.007 [DOI] [PubMed] [Google Scholar]
  • 12.Ma J., Zhang H., Gao Y., Huang J., Liang Z., Feng Q., Chen W., “Iterative image reconstruction for cerebral perfusion ct using a pre-contrast scan induced edge-preserving prior,” Phys. Med. Biol. 57(22), 7519–7542 (2012). 10.1088/0031-9155/57/22/7519 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Zhang Y., Xi Y., Yang Q., Cong W., Zhou J., Wang G., “Spectral ct reconstruction with image sparsity and spectral mean,” IEEE Trans. Comput. Imaging 2(4), 510–523 (2016). 10.1109/TCI.2016.2609414 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Xu Q., Yu H., Mou X., Zhang L., Hsieh J., Wang G., “Low-dose x-ray ct reconstruction via dictionary learning,” IEEE Trans. Med. Imaging 31(9), 1682–1697 (2012). 10.1109/TMI.2012.2195669 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Zhang Y., Mou X., Wang G., Yu H., “Tensor-based dictionary learning for spectral ct reconstruction,” IEEE Trans. Med. Imaging 36(1), 142–154 (2017). 10.1109/TMI.2016.2600249 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Yan M., Chen J., Vese L. A., Villasenor J., Bui A., Cong J., “Em+ tv based reconstruction for cone-beam ct with reduced radiation,” in International Symposium on Visual Computing , (Springer, 2011), pp. 1–10. [Google Scholar]
  • 17.Hammernik K., Würfl T., Pock T., Maier A., “A deep learning architecture for limited-angle computed tomography reconstruction,” in Bildverarbeitung für die Medizin 2017 , (Springer, 2017), pp. 92–97. [Google Scholar]
  • 18.Adler J., Öktem O., “Learned primal-dual reconstruction,” IEEE Trans. Med. Imaging 37(6), 1322–1332 (2018). 10.1109/TMI.2018.2799231 [DOI] [PubMed] [Google Scholar]
  • 19.Kelm Z. S., Blezek D., Bartholmai B., Erickson B. J., “Optimizing non-local means for denoising low dose ct,” in 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro , (IEEE, 2009), pp. 662–665. [Google Scholar]
  • 20.Ma J., Huang J., Feng Q., Zhang H., Lu H., Liang Z., Chen W., “Low-dose computed tomography image restoration using previous normal-dose scan,” Med. Phys. 38(10), 5713–5731 (2011). 10.1118/1.3638125 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Li Z., Yu L., Trzasko J. D., Lake D. S., Blezek D. J., Fletcher J. G., McCollough C. H., Manduca A., “Adaptive nonlocal means filtering based on local noise level for ct denoising,” Med. Phys. 41(1), 011908 (2013). 10.1118/1.4851635 [DOI] [PubMed] [Google Scholar]
  • 22.Aharon M., Elad M., Bruckstein A., “K-svd: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Process. 54(11), 4311–4322 (2006). 10.1109/TSP.2006.881199 [DOI] [Google Scholar]
  • 23.Chen Y., Yin X., Shi L., Shu H., Luo L., Coatrieux J.-L., Toumoulin C., “Improving abdomen tumor low-dose ct images using a fast dictionary learning based processing,” Phys. Med. Biol. 58(16), 5803–5820 (2013). 10.1088/0031-9155/58/16/5803 [DOI] [PubMed] [Google Scholar]
  • 24.Feruglio P. F., Vinegoni C., Gros J., Sbarbati A., Weissleder R., “Block matching 3d random noise filtering for absorption optical projection tomography,” Phys. Med. Biol. 55(18), 5401–5415 (2010). 10.1088/0031-9155/55/18/009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Kang D., Slomka P., Nakazato R., Woo J., Berman D. S., Kuo C.-C. J., Dey D., “Image denoising of low-radiation dose coronary ct angiography by an adaptive block-matching 3d algorithm,” in Medical Imaging 2013: Image Processing , vol. 8669 (International Society for Optics and Photonics, 2013), vol. 8669, p. 86692G. [Google Scholar]
  • 26.Sheng K., Gou S., Wu J., Qi S. X., “Denoised and texture enhanced mvct to improve soft tissue conspicuity,” Med. Phys. 41(10), 101916 (2014). 10.1118/1.4894714 [DOI] [PubMed] [Google Scholar]
  • 27.Mendrik A. M., Vonken E.-J., Rutten A., Viergever M. A., van Ginneken B., “Noise reduction in computed tomography scans using 3-d anisotropic hybrid diffusion with continuous switch,” IEEE Trans. Med. Imaging 28(10), 1585–1594 (2009). 10.1109/TMI.2009.2022368 [DOI] [PubMed] [Google Scholar]
  • 28.Chen H., Zhang Y., Kalra M. K., Lin F., Chen Y., Liao P., Zhou J., Wang G., “Low-dose ct with a residual encoder-decoder convolutional neural network,” IEEE Trans. Med. Imaging 36(12), 2524–2535 (2017). 10.1109/TMI.2017.2715284 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Chen H., Zhang Y., Zhang W., Liao P., Li K., Zhou J., Wang G., “Low-dose ct via convolutional neural network,” Biomed. Opt. Express 8(2), 679–694 (2017). 10.1364/BOE.8.000679 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Wolterink J. M., Leiner T., Viergever M. A., Išgum I., “Generative adversarial networks for noise reduction in low-dose ct,” IEEE Trans. Med. Imaging 36(12), 2536–2545 (2017). 10.1109/TMI.2017.2708987 [DOI] [PubMed] [Google Scholar]
  • 31.Wu D., Kim K., Fakhri G. E., Li Q., “A cascaded convolutional neural network for x-ray low-dose ct image denoising,” arXiv preprint arXiv:1705.04267 (2017).
  • 32.Yang Q., Yan P., Zhang Y., Yu H., Shi Y., Mou X., Kalra M. K., Zhang Y., Sun L., Wang G., “Low-dose ct image denoising using a generative adversarial network with wasserstein distance and perceptual loss,” IEEE Trans. Med. Imaging 37(6), 1348–1357 (2018). 10.1109/TMI.2018.2827462 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Shan H., Zhang Y., Yang Q., Kruger U., Kalra M. K., Sun L., Cong W., Wang G., “3-d convolutional encoder-decoder network for low-dose ct via transfer learning from a 2-d trained network,” IEEE Trans. Med. Imaging 37(6), 1522–1534 (2018). 10.1109/TMI.2018.2832217 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Fan F., Shan H., Kalra M. K., Singh R., Qian G., Getzin M., Teng Y., Hahn J., Wang G., “Quadratic autoencoder (q-ae) for low-dose ct denoising,” IEEE Trans. Med. Imaging 39(6), 2035–2050 (2020). 10.1109/TMI.2019.2963248 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Li M., Hsu W., Xie X., Cong J., Gao W., “Sacnn: self-attention convolutional neural network for low-dose ct denoising with self-supervised perceptual loss network,” IEEE Trans. Med. Imaging 39(7), 2289–2301 (2020). 10.1109/TMI.2020.2968472 [DOI] [PubMed] [Google Scholar]
  • 36.Liang T., Jin Y., Li Y., Wang T., “Edcnn: Edge enhancement-based densely connected network with compound loss for low-dose ct denoising,” in 2020 15th IEEE International Conference on Signal Processing (ICSP), vol. 1 (IEEE, 2020), vol. 1, pp. 193–198. [Google Scholar]
  • 37.Chen K., Pu X., Ren Y., Qiu H., Li H., Sun J., “Low-dose ct image blind denoising with graph convolutional networks,” in International Conference on Neural Information Processing, (Springer, 2020), pp. 423–435. [Google Scholar]
  • 38.Chen Y.-J., Tsai C.-Y., Xu X., Shi Y., Ho T.-Y., Huang M., Yuan H., Zhuang J., “Ct image denoising with encoder-decoder based graph convolutional networks,” in 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI) , (IEEE, 2021), pp. 400–404. [Google Scholar]
  • 39.Odena A., Dumoulin V., Olah C., “Deconvolution and checkerboard artifacts,” Distill 1(10), e3 (2016). 10.23915/distill.00003 [DOI] [Google Scholar]
  • 40.Valsesia D., Fracastoro G., Magli E., “Deep graph-convolutional image denoising,” IEEE Trans. on Image Process. 29, 8226–8237 (2020). 10.1109/TIP.2020.3013166 [DOI] [PubMed] [Google Scholar]
  • 41.Johnson J., Alahi A., Fei-Fei L., “Perceptual losses for real-time style transfer and super-resolution, in European conference on computer vision,” (Springer, 2016), pp. 694–711. [Google Scholar]
  • 42.Hendrycks D., Gimpel K., “Gaussian error linear units (gelus),” arXiv preprint arXiv:1606.08415 (2016).
  • 43.Veličković P., Cucurull G., Casanova A., Romero A., Lio P., Bengio Y., “Graph attention networks,” arXiv preprint arXiv:1710.10903 (2017).
  • 44.Simonyan K., Zisserman A., “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556 (2014).
  • 45.McCollough C. H., Bartley A. C., Carter R. E., Chen B., Drees T. A., Edwards P., Holmes III D. R., Huang A. E., Khan F., Leng S., McMillan K., Michalak G., Nunez K., Yu L., Fletcher J. G., “Low-dose ct for the detection and classification of metastatic liver lesions: results of the 2016 low dose ct grand challenge,” Med. Phys. 44(10), e339–e352 (2017). 10.1002/mp.12345 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Loshchilov I., Hutter F., “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101 (2017).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data underlying the results presented in this paper are available in [45].


Articles from Biomedical Optics Express are provided here courtesy of Optica Publishing Group

RESOURCES