Abstract
Sparse-view computed tomography (SVCT) aims to reconstruct a cross-sectional image using a reduced number of x-ray projections. While SVCT can efficiently reduce the radiation dose, the reconstruction suffers from severe streak artifacts, and the artifacts are further amplified with the presence of metallic implants, which could adversely impact the medical diagnosis and other downstream applications. Previous methods have extensively explored either SVCT reconstruction without metallic implants, or full-view CT metal artifact reduction (MAR). The issue of simultaneous sparse-view and metal artifact reduction (SVMAR) remains under-explored, and it is infeasible to directly apply previous SVCT and MAR methods to SVMAR which may yield non-ideal reconstruction quality. In this work, we propose a dual-domain data consistent recurrent network, called DuDoDR-Net, for SVMAR. Our DuDoDR-Net aims to reconstruct an artifact-free image by recurrent image domain and sinogram domain restorations. To ensure the metal-free part of acquired projection data is preserved, we also develop the image data consistent layer (iDCL) and sinogram data consistent layer (sDCL) that are interleaved in our recurrent framework. Our experimental results demonstrate that our DuDoDR-Net is able to produce superior artifact-reduced results while preserving the anatomical structures, that outperforming previous SVCT and SVMAR methods, under different sparse-view acquisition settings.
Keywords: Sparse View, Metal Artifact, Computed Tomography, Recurrent Network, Dual-Domain Network, Data Consistency
Graphical Abstract
1. Introduction
Computed tomography (CT) is a non-invasive imaging technique that visualizes an object’s internal structures and has become an important tool for medical diagnosis and treatment planning. To reduce radiation dose and speed up acquisition (Strauss and Kaste, 2006), it is of great interest to reconstruct high-quality images from sparse view computed tomography (SVCT) acquisition protocols, i.e. acquiring projection data with a view interval larger than normal. However, SVCT reconstruction based on classical filtered back-projection (FBP) would contain severe streak artifacts, due to missing projection data in the sinogram. For patients with metallic implants, such as spinal implants and hip prostheses (Son et al., 2012; Roth et al., 2012), those streak artifacts are further amplified due to additional missing projection data in the metal trace. The amplified artifacts not only seriously affect the image quality for diagnostic purposes, but also make dose calculation challenging in radiation therapy (Giantsoudi et al., 2017), and affect the attenuation corrections in PET and SPECT (Abdoli et al., 2012; Kamel et al., 2003; Konishi et al., 2021). An example is shown in Figure 1. With the increasing use of metallic implants and the increasing interest in reducing the CT radiation dose, how to reconstruct high-quality CT images for patients with metallic implants under sparse-view conditions is an important problem in CT imaging.
Fig. 1.
An example of CT image with metallic implants under sparse-view condition. Left: metal-free CT image with a full-view acquisition. Middle: artifact-free CT image overlaid with metal segmentation (red mask) for simulation. Right: sparse-view CT image with metallic implants. ×4 under-sampling rate is used. The display window is [−1000 1000] HU.
For SVCT reconstruction, numerous methods have been proposed and can be summarized into two general categories: model-based iterative reconstruction (MBIR) and deep learning based reconstruction (DLR). MBIR methods reconstruct images from sparse-view projection data by iteratively minimizing the predefined image domain regularizers and the inconsistency of sampled projection data. Common choices of the regularizer include total variation (Chambolle and Lions, 1997), nonlocal patches (Zhang et al., 2013a), and dictionary learning (Xu et al., 2012). While MBIR methods could generate high-quality reconstructions, they rely on repetitive forwardand back-projections, thus are computationally heavy. In addition, utilizing regularization solely based on prior assumptions needs careful hyper-parameter tuning to avoid biases in the reconstructions, especially when sparse-view under-sampling rate is high.
On the other hand, deep learning methods have been widely adapted in SVCT and demonstrated promising reconstruction performance. Jin et al. (2017) and Chen et al. (2017) first proposed to use UNet (Ronneberger et al., 2015) and Residual UNet, respectively, to reduce the streak artifacts in the SVCT in the image domain. Yang et al. (2018) and Liao et al. (2018) further enhanced the network performance by including adversarial loss and perceptual loss in the training. Later, Zhang et al. (2018) and Han and Ye (2018) proposed to incorporate dense block and wavelet decomposition into UNet for more robust feature learning of streak artifact reduction. For sinogram domain method, Lee et al. (2018) found that synthesizing complete sinogram from sparse-view sinogram followed by FBP reconstruction could also generate high-quality images. More recently, Zhou et al. (2021c) proposed a cascaded network with a projection data fidelity layer to ensure the sampled sinogram data are preserved, and achieved superior SVCT reconstruction performance. Shen et al. (2019, 2021) investigated SVCT under ultra-sparse settings and demonstrated promising reconstruction performance. However, none of the previous SVCT algorithm designs considered the presence of metallic implants which would further degrade the image quality. Directly adapting the previous SVCT methods to SVCT with metallic implants may lead to sub-optimal reconstruction performance.
For metal artifact reduction (MAR), numerous methods have been proposed for full-view CT. Because the metal artifacts are non-local in the reconstruction and the artifacts are generated from the metal-affected regions in the sinogram, traditional MAR methods mainly addressed this problem in the sinogram domain. Kalender et al. (1987) first proposed a linear interpolation-based method that substitutes the metal-affected sinogram regions by the linear interpolation of its neighboring unaffected data for each projection view. However, the inconsistency between the interpolated values and unaffected values often leads to new artifacts in the reconstructions. To improve the value estimation in the metal-affected sinogram regions, there are several methods that utilize the forward projection of synthesized prior images for sinogram completion (Zhang et al., 2013b; Meyer et al., 2010; Wang et al., 2013). Specifically, these methods first estimate an artifact-reduced prior image based on tissue properties in CT, such as Hounsfield Units values. Then, a forward projection of the prior image is utilized to completed the metal-affected sinogram regions. For instance, NMAR (Meyer et al., 2010) proposed to generate a prior image by a multi-threshold segmentation of the initial reconstructed image (Bal and Spies, 2006). The forward projection of the prior image is used to normalize the sinogram before linear interpolation, thus improving the value estimation in the metal-affected sinogram regions.
On the other hand, deep learning-based MAR methods have also been extensively explored in recent years. Gjesteby et al. (2017) proposed to use a three-layers CNN to suppress the residual artifact in NMAR (Meyer et al., 2010) images. Huang et al. (2018) developed a deep residual network, called RL-ARCNN, for MAR in cervical CT. Wang et al. (2018) proposed to use pix2pix (Isola et al., 2017) for MAR in the ear CT. Unlike these image-to-image translation methods, Zhang and Yu (2018) proposed to use CNN to first predict an artifact suppressed prior image and then use its forward projection to aid correction of the metal-affected sinogram regions. Extended from these ideals, Lin et al. (2019) proposed a dual-domain learning method, called DuDoNet, that first corrects the metal-affected sinogram by a sinogram enhancement network and then inputs the corrected image into an image domain network for the final reconstruction. Lyu et al. (2020) further improved the DuDoNet’s performance by encoding the metal segmentation in dual domains. More recently, Yu et al. (2020) proposed another dual-domain learning method that switches the order of dual-domain correction in DuDoNet. Specifically, they found that deep sinogram completion based on the prior image synthesized from the image-domain network and linear interpolated sinogram yields superior MAR performance. However, none of the above MAR algorithms address the MAR in SVCT, where streak artifacts are significantly amplified. Moreover, except for image-to-image based MAR algorithms (Gjesteby et al., 2017; Wang et al., 2018; Huang et al., 2018), other MAR algorithms are hard to be directly adapted to the sparse-view and metal artifact reduction (SVMAR) problem as they require customized operations in the sinogram.
As the reconstruction quality is seriously degraded when these two conditions are present simultaneously, the stand-alone SVCT methods and MAR methods may not be suitable for SVMAR. Recently, Ketcha et al. (2021) propose a two-step solution to address SVMAR in CBCT. They suggested to first use a sinogram network to predict the full-view sinogram from sparse-view sinogram. Then, FBP reconstructions were generated from the predicted sinograms, and inputted into the image network for image-domain artifact reduction. While their method generated promising results, the sinogram network and the image network are independently optimized, and the performance could be further improved if the sinogram and image networks were optimized in an end-to-end fashion. The one-step predictions from sinogram/image networks may be also prone to overfitting issues. In addition, their method cannot guarantee the metal-free part of acquired projection data are preserved in their final reconstruction. Incorporating the projection data consistency in the network design could potentially better preserve the image content and lead to a better reconstruction.
To address these issues, we develop a novel dual-domain data consistent recurrent network (DuDoDR-Net) for SVMAR. The pipeline of our DuDoDR-Net is depicted in Figure 2. We propose to reconstruct an artifact-free image by recurrent image domain and sinogram domain restorations using an Attention Residual Dense U-Net (AttRDUNet) with convolutional gated recurrent units (convGRU) embedded. To prevent overfitting in the recurrent learning and to ensure the metal-free part of acquired projection data is preserved, we also develop the image data consistent layer (iDCL) and sinogram data consistent layer (sDCL) that concatenate to the recurrent outputs of the image and sinogram restoration networks. Our DuDoDR-Net is trained in an end-to-end fashion with losses supervising in both image and sinogram domains. Our experimental results show that our DuDoDR-Net can generate high-quality CT reconstruction for patients with metallic implants under different sparse-view acquisition protocols.
Fig. 2.
Schematic diagram of our dual-domain data consistent recurrent network (DuDoDR-Net) for simultaneous sparse-view and metal artifact reductions. Our DuDoDR-Net consists of an image restoration network Gimg and a sinogram restoration network Gsino, with image and sinogram data consistency layer interleaved (iDCL/sDCL). Given an input CT image with sparse-view and metal artifacts Isvma reconstructed with FBP, it iteratively goes through Gimg and Gsino to restore the signal in the image and sinogram domains. The outputs of Gimg and Gsino go through the iDCL and sDCL to ensure the metal-free part of acquired projection data is preserved. The final reconstruction Iout is the output of the last recurrent stage.
2. Methods
2.1. Overview
The pipeline of our DuDoDR-Net is illustrated in Figure 2. Our DuDoDR-Net aims to simultaneously reduce the sparse-view and metal artifacts by recurrent image domain learning and sinogram domain learning. Specifically, DuDoDR-Net consists of a recurrent image restoration network Gimg and a recurrent sinogram restoration network Gsino, where Gimg and Gsino share the same architecture (Section 2.2). The outputs of Gimg and Gsino are followed by the iDCL and sDCL, respectively, to ensure the data consistency of the metal-free part of acquired projection data (Section 2.3).
Given the FBP image reconstructed from the sparse-view metal-affected sinogram , we first input it into Gimg for image domain restoration. As the image content of Isvma is highly degraded (Figure 1), image prediction Isyn from image domain Gimg may suffer from poor image content fidelity. To improve it, we first input Isyn into iDCL where the image is forward projected into sinogram, and the metal-free part of acquired projection data is integrated into the sinogram to ensure the projection data consistency. Then, the generated data consistent sinogram Sdc is inputted into Gsino for sinogram content refinement. Similar to iDCL, the refined sinogram Ssyn is then input into sDCL where the metal-free part of acquired sinogram data is integrated into the whole sinogram to ensure the sinogram data consistency, and the output sinogram is converted to restored image Idc via filtered back projection.
It has been demonstrated previously that one-step image reconstruction is prone to overfitting issues and yields sub-optimal performance (Schlemper et al., 2017; Zhou et al., 2021c). Thus, we use a recurrent learning method. Specifically, the restored image Idc from previous stage is input again into DuDoDR-Net. The parameter of encoder (E) and decoder (D) of Gimg and Gsino are kept identical over different stages, thus the model size is constant. In the bottleneck of Gimg and Gsino, we use convolution Gated Recurrent Unit (convGRU) (Ballas et al., 2015), such that the image and sinogram restoration states are passed over different learning stages. The final restored image Iout is the last stage’s output of DuDoDR-Net.
2.2. Network Architecture
We propose an Attention Residual Dense UNet (AttRDUNet) for Gimg and Gsino. The network architecture is depicted in Figure 3. AttRDUNet is a U-shape network (Ronneberger et al., 2015) with Attention Residual Block (AttRDB) at each level of Encoder and Decoder, and convGRU in the bottleneck. Each AttRDB contains densely connected convolution layers (Huang et al., 2017), squeeze-and-excitation (SE) layer (Hu et al., 2018), and residual connection.
Fig. 3.
The network architecture of Attention Residual Dense U-Net (AttRDUNet) used for the image restoration network Gimg and sinogram restoration network Gsino in DuDoDR-Net (Figure 2). Our AttRDUNet consists of Attention Residual Dense Blocks (AttRDB) at different resolution levels, and convolutional GRU (convGRU) in the bottleneck for recurrent learning.
Given the input feature Fin, the t-th convolution output of dense connected layers can be written as
(1) |
where denotes the t-th convolution followed by Leaky-ReLU, {} means concatenation along feature channel. Here, the number of convolutions t ≤ 4. Then, we use a 1×1 convolution layer Pdsf, to fuse Fin and all the dense features, and can be written as:
(2) |
Then, we re-calibrate the feature channel of Fdsf by passing it through the SE layer Pse. The re-calibrated feature is element-wise summation with Fin to generate the output feature. Thus, the AttRDB output can be written as:
(3) |
In the encoding path, each AttRDB’s output pass through a 2×2 average pooling layer before feeding into the next AttRDB. The output of the encoding path is fed into convGRU for recurrent learning. Given the previous hidden state ht−1 and the input xt, the output can be written as:
(4) |
(5) |
(6) |
(7) |
where σ and * are the element-wise sigmoid and the convolution operation, respectively. W and U are 2D convolutional kernels with the size of 3 × 3. Here, we use a two-layer structure where we stack two convGRU to form a stacked convGRU, with the second convGRU taking in outputs of the first convGRU and computing the final output. The hidden states are passed to the stacked convGRU of Gimg and Gsino in the next image/sinogram restoration stage, while the final output of convGRU is used for decoding. In the decoding path, the input to each AttRDB is the element-wise summation of the skip connection from the encoding path and the transpose convolutional feature from the previous decoding block.
2.3. Data Consistency Layer
The goal of the Data Consistency Layer (DCL) is to ensure the metal-free part of acquired projection data is preserved in the final reconstruction. As illustrated in Figure 2, we propose to use iDCL and sDCL at outputs of Gimg and Gsino, respectively. Detailed structure of iDCL and sDCL are illustrated in Figure 4. Given the metal trace (metal affected sinogram region equals to true) and the sparse-view sinogram mask (non-acquired sinogram region equals to true), we first generate a binary mask for sinogram regions that require restoration via:
(8) |
Then, the binary mask for sinogram regions that do not require restoration is 1 − Ma.
Fig. 4.
The architectures of the image data consistency layer (iDCL) and the sinogram data consistency layer (sDCL) used in DuDoDR-Net (Figure 2). We use the union of metal trace Mmt and sparse-view mask Msv to generate a combined mask Ma which indicates the sinogram regions that require data from network synthesis, while the data in other sinogram regions is the metal-free part of acquired projection data that should be preserved. In both iDCL and sDCL, Ma is used for combining the sparse-view sinogram Ssvma and the network predicted sinogram Sf p/Ssyn to generate data consistent outputs.
For iDCL, given a input image Isyn, we first perform a forward projection to generate its sinogram by:
(9) |
where Pfp is the forward projection layer (Zhou et al., 2021c). Then, we combine the forward projected sinogram with the already acquired sinogram via:
(10) |
where ⊙ is the element-wise multiplication. The final output of iDCL is a data consistent sinogram Sidc that is then input into Gsino for sinogram refinement.
For sDCL, given a refined sinogram Ssyn, we compute the data consistent sinogram via:
(11) |
Then, we apply a filter back-projection to reconstruct a data consistent image by
(12) |
where Pfbp is the filtered back-projection layer (Zhou et al., 2021c). iDCL and sDCL are embedded in our DuDoDR-Net to ensure the metal-free part of acquired projection data is preserved in the restoration outputs over the recurrent learning stages.
2.4. Overall Objective Function
Our DuDoDR-Net learns to restore the degraded image in both image and sinogram domains. Our loss function consists of three parts, including image domain loss, sinogram domain loss, and final reconstruction loss. The image domain loss directly supervise the recurrent outputs of Gimg by:
(13) |
where is the image restoration output at the i-th recurrent stage, with a total number of n recurrent stage (Figure 2). Igt is the ground-truth image without metallic implants and reconstructed from full-view acquisition. On the other hand, the sinogram domain loss directly supervises the recurrent outputs of Gsino by:
(14) |
where is the sinogram restoration output at the i-th recurrent stage. Sgt is the ground-truth full-view sinogram without metallic implants. At the final output of DuDoDR-Net, we further supervised the final reconstruction by:
(15) |
where S is the metallic implants segmentation mask in the image domain. aims to supervise the image restoration in the non-metal region. Finally, the overall loss function can be written as:
(16) |
where we empirically set αimg = αsino = αout = 1 to achieve optimal performance. Our DuDoDR-Net is trained in an end-to-end fashion by optimizing the loss function.
3. Experiments
3.1. Data Preparation
We trained and evaluated our method using realistically simulated sparse-view CT images with metallic implants. Similar to the data preparation procedure in Lin et al. (2019); Yu et al. (2020); Lyu et al. (2020), we randomly chose 1200 CT images from the DeepLesion dataset (Yan et al., 2018) and collected 100 manually segmented metal implants with various locations, shapes, and sizes from (Zhang and Yu, 2018). We randomly chose 1000 CT images and 90 metal masks to synthesize the training data. The remaining 10 metal masks were paired with the remaining 200 CT images to generate 2000 combinations for evaluation.
We used the same x-ray projection protocol in (Lin et al., 2019; Yu et al., 2020; Lyu et al., 2020) to simulate the sparse-view metal-affected sinogram and CT images, by inserting metallic implants into clean CT images. Specifically, we assumed an equiangular fan-beam projection geometry with a 120 kVp polyenergetic x-ray source. To simulate Poisson noise, we assumed the incident x-ray contains 2 × 107 photons. For each image, the fully sampled sinogram was generated via 360 projection views uniformly spaced between 0 − 360 degrees. To generate sparse-view metal-affected sinogram Ssvma, we uniformly sampled 180, 90, and 60 projection views from the 360 projection views, mimicking 2, 4, and 6 fold radiation dose reduction. The corresponding sparse-view metal-affected image Isvma is reconstructed from Ssvma by filtered back-projection. Moreover, the CT images were resized to 416 × 416 before the simulation, thus resulting in the sinogram with the size of 641 × 640.
3.2. Implementation Details
We implemented our DuDoDR-Net in Python with Pytorch1. We trained our DuDoDR-Net in an end-to-end manner with differential forward projection and filtered back-projection layers, same as in CasRedSCAN (Zhou et al., 2021c). During training, all the images had a size of 416 × 416, and the corresponding sinograms were with a size of 641 × 640. The Adam solver (Kingma and Ba, 2014) was used to optimize our network with the parameters (β1, β2) = (0.5, 0.999) and a learning rate of 2e − 4. We trained 600 epochs with a batch size of 2 on an NVIDIA Quadro RTX 8000 GPU with 48GB memory.
3.3. Experimental Results
For quantitative evaluation, we evaluated the reconstruction quality using Root Mean Square Error (RMSE) and Structural Similarity Index (SSIM) by comparing the synthetic reconstructions to the ground truth reconstructions in the non-metal image regions. RMSE focuses on the evaluation of the intensity profile recovery, and SSIM focuses on the structure profile recovery. For comparative study, we first compared our results against two sparse-view deep learning reconstruction methods, including the combination of Densenet and Deconvolution (DDNet) (Zhang et al., 2018) and FBP-Net (Jin et al., 2017). For a fair comparison, both models were trained using a modified image domain loss, as defined in Eq. 15, such that they focus on the recovery of the image signal in the non-metal regions. Then, we also compared against the previous state-of-the-art simultaneous sparse-view and metal artifacts reduction method, called CNNMAR (Ketcha et al., 2021). In both quantitative and qualitative evaluations, we performed evaluations under different combinations of sparse-view and metal insertion settings.
The qualitative comparison of different reconstruction methods under different sparse-view settings is shown in Figure 5. As we can observe from the ×2 SV experiment (1st row), the metallic implant is the primary cause of image quality degradation. Serve image intensity and structure distortions can be found in the spine and pelvic bone regions. Previous methods cannot fully recover the uniform intensity inside the spine region and have difficulties in removing the streak artifacts. Previous methods are also prone to diminish the pelvis bone signal adjacent to the metal implants and may introduce new artifacts. As compared to previous methods, our DuDoDR-Net is able to produce reconstructions with intensity and structure best matching with the ground-truth image. Similar observations can be found in the ×4 SV experiment (2nd row). With metal implanted in the spine region, image quality is seriously degraded by both sparse-view and metal artifacts. Previous methods have a hard time at recovering the adjacent soft-tissue regions of metal implants. However, our DuDoDR-Net is able to preserve the important soft-tissue structure, such as the aorta next to the spine. In the bowel regions where metal and streak artifacts also seriously degrade the appearance, our method can also better preserve the soft-tissue signal and fine structures. In the last row, we can see the sparse-view and metal artifacts were further amplified when the sparse-view under-sampling rate is increased to ×6. As compared to previous methods, our DuDoDR-Net still can better preserve the fine structure in both bone and soft-tissue regions. For example, the soft-tissue signals of the left common carotid artery and brachiocephlic artery were heavily diminished by previous methods, while our method can preserve both artery structures in the chest region. Similarly, the rib structures were also difficult for previous methods to recover, while our method can accurately reconstruct the signal.
Fig. 5.
Visual comparison of sparse-view reconstructions with metallic implants under ×2, ×4, and ×6 sparse-view under-sampling rates. The metal regions are overlaid with the red masks. The zoom-in regions (blue and yellow boxes) are annotated on the ground-truth images. RMSE and SSIM values are computed for individual images (orange). The display window is [−1000 1000] HU.
Table 1 outlines the quantitative comparison of different methods under three different sparse-view acquisition settings. Compared to the best previous method’s performance of CNNMAR (Ketcha et al., 2021), we improved the SSIM from 0.962 to 0.978 under the ×4 sparse-view setting, and increased the SSIM from 0.953 to 0.968 under the ×6 sparse-view setting. Similarly, we reduced the RMSE from 33.74 to 22.22 under the ×4 sparse-view setting, and decreased the RMSE from 39.62 to 28.60 under the ×6 sparse-view setting
Table 1.
Quantitative comparison of sparse-view reconstruction under three different sparse-view settings using SSIM and RMSE. Best results are marked in red. Last column shows the number of trainable parameters of each methods.
Evaluation | Number of Parameters | ||||||
---|---|---|---|---|---|---|---|
1/6 | 1/4 | 1/2 | 1/6 | 1/4 | 1/2 | ||
FBP | .061 ± .026 | .098 ± .034 | .286 ± .073 | 673.31 ± 171.73 | 490.14 ± 123.03 | 244.02 ± 63.23 | - |
FBPNet (Jin et al., 2017) | .926 ± .015 | .944 ± .012 | .970 ± .008 | 46.08 ± 9.48 | 35.12 ± 7.75 | 22.31 ± 6.83 | 30M |
DDNet (Zhang et al., 2018) | .892 ± .016 | .833 ± .018 | .958 ± .009 | 57.45 ± 12.63 | 49.53 ± 8.34 | 27.84 ± 7.98 | 0.56M |
CNNMAR (Ketcha et al., 2021) | .953 ± .013 | .962 ± .011 | .977 ± .008 | 39.62 ± 11.89 | 33.74 ± 11.73 | 24.58 ± 10.08 | 2.08M |
Ours | .968 ± .009 | .978 ± .006 | .986 ± .004 | 28.60 ± 8.48 | 22.22 ± 7.35 | 17.35 ± 5.27 | 2.97M |
On the other hand, we further evaluated our model’s performance when different sizes of metallic implants were introduced. Figure 6 illustrated the reconstruction by our models with the small to large metallic implants under both ×2 and ×4 sparse-view conditions. As we can see, our method is able to consistently recover the image appearance with various sizes of metallic implants with different sparse-view acquisition conditions. Table 2 summarized the quantitative evaluations of our model under the ×4 sparse-view setting when different sizes of metallic implants were inserted. In general, images with large metallic implants were harder to recover, resulted in lower SSIM, as compared to small metallic implants. However, our DuDoDR-Net can still consistently keep the SSIM over 0.972 across various sizes of metallic implants, higher than previous methods in each metallic implant setting.
Fig. 6.
Visual comparison of sparse-view reconstructions with small to large metallic implants (top to bottom) under ×2 and ×4 sparse-view settings. The metal regions are overlaid with the red masks. RMSE and SSIM values are computed for individual images (orange). The display window is [−1000 1000] HU.
Table 2.
Quantitative comparison of sparse-view reconstructions with small to large metallic implants (right to left) under the ×4 sparse-view acquisition setting. Best results are marked in red.
SSIM/RMSE | Average | |||||
---|---|---|---|---|---|---|
FBP | .081/603.55 | .088/494.84 | .107/431.75 | .142/392.82 | .162/314.01 | .098/490.14 |
FBPNet (Jin et al., 2017) | .945/34.95 | .940/47.53 | .946/31.62 | .947/30.71 | .949/29.61 | .944/35.12 |
DDNet (Zhang et al., 2018) | .832/53.19 | .831/57.06 | .837/45.04 | .833/45.67 | .840/42.84 | .833/49.53 |
CNNMAR (Ketcha et al., 2021) | .956/43.58 | .952/51.44 | .968/24.48 | .969/23.75 | .964/34.71 | .962/33.74 |
Ours | .975/24.19 | .972/37.85 | .981/18.61 | .982/16.95 | .982/16.97 | .978/22.22 |
3.4. Ablative Studies
1). Effectiveness of dual-domain learning:
In our network, each recurrent block consists of an image-domain network and a sinogram-domain network. To validate the effectiveness of dual-domain learning design, we also trained an image-domain recurrent network with DCL, and compared it against our dual-domain DuDoDR-Net. A visual comparison is illustrated in Figure 7. As we can observe, the reconstruction from the image-domain recurrent network contains more residual artifacts as compared to our DuDoDR-Net.
Fig. 7.
Visual comparison of reconstructions under ×4 sparse-view setting. The bottom row compares the reconstructions from the image-domain-only network against our DuDoDR-Net. The display window is [−1000 1000] HU.
The quantitative results of the domain analysis are summarized in Table 3, where we analyzed the performances under all three sparse-view conditions. It is observed that the dual-domain based learning achieves higher SSIM and lower RMSE values than the image domain only based learning. It is worth noticing, as the sparse-view under-sampling rate increases, the performance of image-domain based learning significantly decreases, while the dual-domain based learning can maintain the SSIM above 0.968 and the RMSE below 28.60.
Table 3.
Quantitative analysis of the dual-domain learning used in our DuDoDR-Net. Image vs dual-domain-based learning is evaluated under ×2, ×4, and ×6 sparse-view conditions.
SSIM/RMSE | ×6 | ×4 | ×2 |
---|---|---|---|
Image-Domain Only | .931/42.84 | .961/32.38 | .985/25.24 |
Dual-Domain | .968/28.60 | .978/22.22 | .986/17.35 |
2). Impact of the number of dual-domain recurrent blocks:
In our DuDoDR-Net, the number of recurrent can be flexibly adjusted. Thus, we analyzed the impact of increasing the number of recurrent blocks on the reconstruction performance. Figure 8 illustrated one example of how the reconstruction quality gradually improved as the number of recurrent blocks increases. As we can obverse, the discontinuous bone artifacts on the sternum are gradually removed as the number of recurrent blocks increases. The quantitative evaluations with SSIM and RMSE were plotted in Figure 9. Using more recurrent blocks boosts the reconstruction performance, and the improvement rate starts to converge after the number of recurrent blocks equals 3. For example, increasing the number of recurrent blocks from 4 to 5 only increases the SSIM value by less than 0.003 and reduces the RMSE value by less than 1 on average.
Fig. 8.
Comparison of reconstructions using different numbers of recurrent (N) in our DuDoDR-Net. Artifacts on the sternum are gradually removed as the number of recurrent increases. The display window is [−1000 1000] HU.
Fig. 9.
The effect of increasing the number of recurrent blocks in our DuDoDR-Net. Both RMSE (blue curve) and SSIM (red curve) were evaluated and plotted.
4. Discussion
In this work, we designed a novel framework, called DuDoDR-Net, for SVMAR. Specifically, we developed three key components in our DuDoDR-Net that enable reasonable reconstruction when the conditions of sparse-view acquisition and metallic implants are both present. First of all, we proposed to generate an artifact-free reconstruction by dual-domain learning, including the image domain and sinogram domain learning. The initial FBP reconstruction is first input into an image domain network for initial image-domain artifact removal. Then, the forward projection layer converts it into a sinogram and further corrects the signal using a sinogram domain network. As a matter of fact, Ketcha et al. (2021) also utilized a dual-domain strategy where they used independent sinogram domain and image domain networks for two-step restorations. However, their dual-domain strategy without differential forward projection or filtered back-projection layer cannot enable an end-to-end dual-domain learning. On the other hand, the end-to-end dual-domain learning strategy has also been proved to be useful in full-view CT MAR (Yu et al., 2020; Lin et al., 2019; Lyu et al., 2020; Wang et al., 2021a). However, the previous dual-domain networks proposed for full-view CT MAR only consider single-step dual-domain restoration. Single-step network design may be prone to the overfitting issue. Thus, our second component of recurrent learning borrows the idea of MBIR that reconstructs an artifact-free image in an iterative fashion. To prevent the overfitting issue in our dual-domain recurrent learning and to ensure the metal-free part of acquired projection data is preserved, for our third component, we also developed the iDCL and sDCL interleaved in our recurrent network. As we can see from Figure 8 and Figure 9, as compared to single-step dual-domain learning (N = 1), our dual-domain recurrent learning with iDCL/sDCL can reasonably prevent overfitting and improve the reconstruction quality.
We demonstrated the feasibility of using our DuDoDR-Net for CT reconstructions under three different sparse-view conditions with different metallic implants, as shown in our result section. Firstly, our DuDoDR-Net consistently achieves the best performance among the compared methods. However, we can observe that bigger metallic implants with higher sparse-view under-sampling rate is harder to reconstruct as compared to the reconstruction with lower under-sampling rate and smaller metallic implants (Table 1 & 2). For example, the RMSE value increases from 17.35 to 28.60 when increasing the under-sampling rate from ×2 to ×6. While our method is able to maintain the RMSE value smaller than 28.60 even under a high under-sampling rate, i.e. ×6, we believe a proper under-sampling rate should be chosen to balance the trade-off between image quality and radiation dose and metal size.
The presented work also has potential limitations. First of all, even though increasing the number of recurrent blocks in our DuDoDR-Net improves the reconstruction quality, the memory consumption and the computation complexity will increase and longer training and inference time is required as the number of recurrent increases. As we can see in Figure 9, the increase in performance starts to converge after N = 3. In our experiments, we set N = 3 to balance the memory consumption and inference time. The inference time of our DuDoDR-Net is less than 600 ms on average for each image reconstruction, which provides reasonable reconstruction speed in clinical practice. Secondly, since it is clinically infeasible to acquire real metal-free and metal-inserted sparse-view CT data for training and there is no public real projection data, we utilized a realistic simulation projection program to produce synthesized training pairs from clinical metal-free CT images (Yan et al., 2018), which is a common data preparation strategy used in full-view CT MAR (Yu et al., 2020; Lin et al., 2019; Lyu et al., 2020; Wang et al., 2021a). However, the quality of simulated data may affect the network performance, due to factors such as limited variability of metal masks/shapes and non-aligned x-ray exposure settings. In our future studies, we will investigate our DuDoDR-Net’s performance on real patients with metallic implants who undergo sparse-view acquisition, and study how to create a better simulation dataset to further improve the performance when applying on real patient data. Finally, DuDoDR-Net requires the metal mask as one of the inputs. In clinical practice, one can utilize semi-automatic approaches, such as thresholding methods with manual adjustments, to obtain the metal mask. Fully automatic deep learning models could also be trained to provide the metal mask for our DuDoDR-Net, with methods such as learning-based metal segmentation (Gottschalk et al., 2021). The fully automatic metal segmentation deep model could also potentially combine with our DuDoDR-Net to further improve the reconstruction quality.
The design of our DuDoDR-Net also suggests several interesting topics for future studies. In this work, we only explored sparse-view acquisition while limited-angle acquisition is also commonly used in many CT systems, such as dental CT and C-arm CBCT (Orth et al., 2008; Zhou et al., 2019, 2021a). By adjusting the sinogram mask MSV in Eq. 8, our DuDoDR-Net can be flexibly adapted to limited-angle acquisitions and other limited-view acquisition scenarios. Secondly, combining the low-dose/low-current CT with sparse-view settings could further reduce the radiation dose. As a matter of fact, Shan et al. (2019) had shown that image-domain-only cascade network design can efficiently reduce the noise in low-dose CT. As the single domain cascade network is potentially efficient in low-dose CT, our dual-domain recurrent learning method could also potentially be adapted to sparse-view low-dose CT with metallic implants. Thirdly, CT is an essential component in PET/SPECT for attenuation correction. Dose reduction methods have been extensively explored for stand-alone modalities (Chen et al., 2017; Lu et al., 2019; Zhou et al., 2021b; Chen et al., 2021), while joint-modality low-dose reconstruction methods are still under-explored, such as low-dose PET-CT. Specifically, information from different modalities under low-dose protocols could be mutually beneficial and may improve the reconstruction in both modalities. Since our DuDoDR-Net may also be applied to different tomographic imaging modalities, we believe our DuDoDR-Net could be potentially integrated into a joint low-dose reconstruction framework for multi-modality imaging, such as PET-CT. Lastly, our DuDoDR-Net is an open framework with flexibility in backbone networks. While we used a classical U-shape network with AttRDB and convGRU as our backbone network, we do not claim the optimality of the design for SVMAR. Other network architectures, such as vision transformer networks (Liu et al., 2021; Wang et al., 2021b), could be deployed in our DuDoDR-Net and may yield better reconstruction performance.
5. Conclusion
We present a dual-domain data consistent recurrent network, called DuDoDR-Net, for simultaneous sparse-view and metal artifact reduction in CT. The DuDoDR-Net aims to reconstruct an artifact-free image by recurrent image domain and sinogram domain restorations, where iDCL/sDCL are interleaved in the recurrent network to ensure the metal-free part of acquired projection data is preserved. Experimental results demonstrate that our DuDoDR-Net can provide superior reconstruction performance than existing methods under different sparse-view acquisitions when various metallic implants are presented.
Highlights.
We propose a dual-domain data consistent recurrent network for simultaneous sparse view and metal artifact reduction in CT.
To ensure the metal-free part of acquired projection data is preserved, we develop the image data consistent layer (iDCL) and sinogram data consistent layer (sDCL) that are interleaved in our recurrent framework.
We demonstrated that our method can produce superior artifact-reduced results while preserving the anatomical structures and outperforming previous SVCT and SVMAR methods under different sparse-view acquisition settings.
Acknowledgments
This work was supported by funding from the National Institutes of Health (NIH) under grant number R01EB025468.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Credit authorship contribution statement
Bo Zhou: Conceptualization, Methodology, Software, Visualization, Validation, Formal analysis, Writing original draft. Xiongchao Chen: Writing - review and editing. S. Kevin Zhou: Conceptualization, Writing - review and editing. James S. Duncan: Conceptualization, Writing - review and editing, Supervision. Chi Liu: Conceptualization, Writing - review and editing, Supervision.
Declaration of interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References
- Abdoli M, Dierckx RA, Zaidi H, 2012. Metal artifact reduction strategies for improved attenuation correction in hybrid pet/ct imaging. Medical physics 39, 3343–3360. [DOI] [PubMed] [Google Scholar]
- Bal M, Spies L, 2006. Metal artifact reduction in ct using tissue-class modeling and adaptive prefiltering. Medical physics 33, 2852–2859. [DOI] [PubMed] [Google Scholar]
- Ballas N, Yao L, Pal C, Courville A, 2015. Delving deeper into convolutional networks for learning video representations. arXiv preprint arXiv:1511.06432.
- Chambolle A, Lions PL, 1997. Image recovery via total variation minimization and related problems. Numerische Mathematik 76, 167–188. [Google Scholar]
- Chen H, Zhang Y, Kalra MK, Lin F, Chen Y, Liao P, Zhou J, Wang G, 2017. Low-dose ct with a residual encoder-decoder convolutional neural network. IEEE transactions on medical imaging 36, 2524–2535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen X, Zhou B, Shi L, Liu H, Pang Y, Wang R, Miller EJ, Sinusas AJ, Liu C, 2021. Ct-free attenuation correction for dedicated cardiac spect using a 3d dual squeeze-and-excitation residual dense network. Journal of Nuclear Cardiology, 1–16. [DOI] [PubMed] [Google Scholar]
- Giantsoudi D, De Man B, Verburg J, Trofimov A, Jin Y, Wang G, Gjesteby L, Paganetti H, 2017. Metal artifacts in computed tomography for radiation therapy planning: dosimetric effects and impact of metal artifact reduction. Physics in Medicine & Biology 62, R49. [DOI] [PubMed] [Google Scholar]
- Gjesteby L, Yang Q, Xi Y, Zhou Y, Zhang J, Wang G, 2017. Deep learning methods to guide ct image reconstruction and reduce metal artifacts, in: Medical Imaging 2017: Physics of Medical Imaging, International Society for Optics and Photonics. p. 101322W. [Google Scholar]
- Gottschalk TM, Maier A, Kordon F, Kreher BW, 2021. Learning-based patch-wise metal segmentation with consistency check. arXiv preprint arXiv:2101.10914.
- Han Y, Ye JC, 2018. Framing u-net via deep convolutional framelets: Application to sparse-view ct. IEEE transactions on medical imaging 37, 1418–1429. [DOI] [PubMed] [Google Scholar]
- Hu J, Shen L, Sun G, 2018. Squeeze-and-excitation networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141. [Google Scholar]
- Huang G, Liu Z, Van Der Maaten L, Weinberger KQ, 2017. Densely connected convolutional networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708. [Google Scholar]
- Huang X, Wang J, Tang F, Zhong T, Zhang Y, 2018. Metal artifact reduction on cervical ct images by deep residual learning. Biomedical engineering online 17, 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Isola P, Zhu JY, Zhou T, Efros AA, 2017. Image-to-image translation with conditional adversarial networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125–1134. [Google Scholar]
- Jin KH, McCann MT, Froustey E, Unser M, 2017. Deep convolutional neural network for inverse problems in imaging. IEEE Transactions on Image Processing 26, 4509–4522. [DOI] [PubMed] [Google Scholar]
- Kalender WA, Hebel R, Ebersberger J, 1987. Reduction of ct artifacts caused by metallic implants. Radiology 164, 576–577. [DOI] [PubMed] [Google Scholar]
- Kamel EM, Burger C, Buck A, von Schulthess GK, Goerres GW, 2003. Impact of metallic dental implants on ct-based attenuation correction in a combined pet/ct scanner. European radiology 13, 724–728. [DOI] [PubMed] [Google Scholar]
- Ketcha MD, Marrama M, Souza A, Uneri A, Wu P, Zhang X, Helm PA, Siewerdsen JH, 2021. Sinogram+image domain neural network approach for metal artifact reduction in low-dose cone-beam computed tomography. Journal of Medical Imaging 8, 052103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kingma DP, Ba J, 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Konishi T, Shibutani T, Okuda K, Yoneyama H, Moribe R, Onoguchi M, Nakajima K, Kinuya S, 2021. Metal artifact reduction for improving quantitative spect/ct imaging. Annals of Nuclear Medicine 35, 291–298. [DOI] [PubMed] [Google Scholar]
- Lee H, Lee J, Kim H, Cho B, Cho S, 2018. Deep-neural-network-based sinogram synthesis for sparse-view ct image reconstruction. IEEE Transactions on Radiation and Plasma Medical Sciences 3, 109–119. [Google Scholar]
- Liao H, Huo Z, Sehnert WJ, Zhou SK, Luo J, 2018. Adversarial sparse-view cbct artifact reduction, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 154–162. [Google Scholar]
- Lin WA, Liao H, Peng C, Sun X, Zhang J, Luo J, Chellappa R, Zhou SK, 2019. Dudonet: Dual domain network for ct metal artifact reduction, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10512–10521. [Google Scholar]
- Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B, 2021. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030.
- Lu W, Onofrey JA, Lu Y, Shi L, Ma T, Liu Y, Liu C, 2019. An investigation of quantitative accuracy for deep learning based denoising in oncological pet. Physics in Medicine & Biology 64, 165019. [DOI] [PubMed] [Google Scholar]
- Lyu Y, Lin WA, Liao H, Lu J, Zhou SK, 2020. Encoding metal mask projection for metal artifact reduction in computed tomography, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 147–157. [Google Scholar]
- Meyer E, Raupach R, Lell M, Schmidt B, Kachelrieß M, 2010. Normalized metal artifact reduction (nmar) in computed tomography. Medical physics 37, 5482–5493. [DOI] [PubMed] [Google Scholar]
- Orth RC, Wallace MJ, Kuo MD, of the Society of Interventional Radiology, T.A.C., et al. , 2008. C-arm cone-beam ct: general principles and technical considerations for use in interventional radiology. Journal of Vascular and Interventional Radiology 19, 814–820. [DOI] [PubMed] [Google Scholar]
- Ronneberger O, Fischer P, Brox T, 2015. U-net: Convolutional networks for biomedical image segmentation, in: International Conference on Medical image computing and computer-assisted intervention, Springer. pp. 234–241. [Google Scholar]
- Roth TD, Maertz NA, Parr JA, Buckwalter KA, Choplin RH, 2012. Ct of the hip prosthesis: appearance of components, fixation, and complications. Radiographics 32, 1089–1107. [DOI] [PubMed] [Google Scholar]
- Schlemper J, Caballero J, Hajnal JV, Price AN, Rueckert D, 2017. A deep cascade of convolutional neural networks for dynamic mr image reconstruction. IEEE transactions on Medical Imaging 37, 491–503. [DOI] [PubMed] [Google Scholar]
- Shan H, Padole A, Homayounieh F, Kruger U, Khera RD, Nitiwarangkul C, Kalra MK, Wang G, 2019. Competitive performance of a modularized deep neural network compared to commercial algorithms for low-dose ct image reconstruction. Nature Machine Intelligence 1, 269–276. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shen L, Zhao W, Capaldi D, Pauly J, Xing L, 2021. A geometry-informed deep learning framework for ultra-sparse 3d tomographic image reconstruction. arXiv preprint arXiv:2105.11692. [DOI] [PubMed]
- Shen L, Zhao W, Xing L, 2019. Patient-specific reconstruction of volumetric computed tomography images from a single projection view via deep learning. Nature biomedical engineering 3, 880–888. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Son SH, Kang YN, Ryu MR, 2012. The effect of metallic implants on radiation therapy in spinal tumor patients with metallic spinal implants. Medical Dosimetry 37, 98–107. [DOI] [PubMed] [Google Scholar]
- Strauss KJ, Kaste SC, 2006. The alara (as low as reasonably achievable) concept in pediatric interventional and fluoroscopic imaging: striving to keep radiation doses as low as possible during fluoroscopy of pediatric patientsa white paper executive summary. Radiology 240, 621–622. [DOI] [PubMed] [Google Scholar]
- Wang J, Wang S, Chen Y, Wu J, Coatrieux JL, Luo L, 2013. Metal artifact reduction in ct using fusion based prior image. Medical physics 40, 081903. [DOI] [PubMed] [Google Scholar]
- Wang J, Zhao Y, Noble JH, Dawant BM, 2018. Conditional generative adversarial networks for metal artifact reduction in ct images of the ear, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 3–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang T, Xia W, Huang Y, Sun H, Liu Y, Chen H, Zhou J, Zhang Y, 2021a. Dan-net: Dual-domain adaptive-scaling non-local network for ct metal artifact reduction. arXiv preprint arXiv:2102.08003. [DOI] [PubMed]
- Wang W, Xie E, Li X, Fan DP, Song K, Liang D, Lu T, Luo P, Shao L, 2021b. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. arXiv preprint arXiv:2102.12122.
- Xu Q, Yu H, Mou X, Zhang L, Hsieh J, Wang G, 2012. Low-dose x-ray ct reconstruction via dictionary learning. IEEE transactions on medical imaging 31, 1682–1697. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yan K, Wang X, Lu L, Summers RM, 2018. Deeplesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning. Journal of Medical Imaging 5, 036501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang Q, Yan P, Zhang Y, Yu H, Shi Y, Mou X, Kalra MK, Zhang Y, Sun L, Wang G, 2018. Low-dose ct image denoising using a generative adversarial network with wasserstein distance and perceptual loss. IEEE transactions on medical imaging 37, 1348–1357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yu L, Zhang Z, Li X, Xing L, 2020. Deep sinogram completion with image prior for metal artifact reduction in ct images. IEEE Transactions on Medical Imaging 40, 228–238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang H, Huang J, Ma J, Bian Z, Feng Q, Lu H, Liang Z, Chen W, 2013a. Iterative reconstruction for x-ray computed tomography using prior-image induced nonlocal regularization. IEEE Transactions on Biomedical Engineering 61, 2367–2378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Y, Yan H, Jia X, Yang J, Jiang SB, Mou X, 2013b. A hybrid metal artifact reduction algorithm for x-ray ct. Medical physics 40, 041910. [DOI] [PubMed] [Google Scholar]
- Zhang Y, Yu H, 2018. Convolutional neural network based metal artifact reduction in x-ray computed tomography. IEEE transactions on medical imaging 37, 1370–1381. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Z, Liang X, Dong X, Xie Y, Cao G, 2018. A sparse-view ct reconstruction method based on combination of densenet and deconvolution. IEEE transactions on medical imaging 37, 1407–1417. [DOI] [PubMed] [Google Scholar]
- Zhou B, Augenfeld Z, Chapiro J, Zhou SK, Liu C, Duncan JS, 2021a. Anatomy-guided multimodal registration by learning segmentation without ground truth: Application to intraprocedural cbct/mr liver segmentation and registration. Medical image analysis 71, 102041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou B, Lin X, Eck B, 2019. Limited angle tomography reconstruction: Synthetic reconstruction via unsupervised sinogram adaptation, in: International Conference on Information Processing in Medical Imaging, Springer. pp. 141–152. [Google Scholar]
- Zhou B, Tsai YJ, Chen X, Duncan JS, Liu C, 2021b. Mdpet: A unified motion correction and denoising adversarial network for low-dose gated pet. IEEE Transactions on Medical Imaging. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou B, Zhou SK, Duncan JS, Liu C, 2021c. Limited view tomographic reconstruction using a cascaded residual dense spatial-channel attention network with projection data fidelity layer. IEEE Transactions on Medical Imaging. [DOI] [PMC free article] [PubMed] [Google Scholar]