Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Aug 7.
Published in final edited form as: Phys Med Biol. 2019 May 23;64(11):115004. doi: 10.1088/1361-6560/ab0dc0

Higher SNR PET Image Prediction using A Deep Learning Model and MRI Image

Chih-Chieh Liu 1, Jinyi Qi 1
PMCID: PMC7413624  NIHMSID: NIHMS1614185  PMID: 30844784

Abstract

PET images often suffer poor signal-to-noise ratio (SNR). Our objective is to improve the SNR of PET images using a deep neural network (DNN) model and MRI images without requiring any higher SNR PET images in training.

Methods

Our proposed DNN model consists of three modified U-Nets (3U-net). The PET training input data and targets were reconstructed using filtered-backprojection (FBP) and maximum likelihood expectation maximization (MLEM), respectively. FBP reconstruction was used because of its computational efficiency so that the trained network not only removes noise, but also accelerates image reconstruction. Digital brain phantoms downloaded from BrainWeb were used to evaluate the proposed method. Poisson noise was added into sinogram data to simulate a 6-minute brain PET scan. Attenuation effect was included and corrected before the image reconstruction. Extra Poisson noise was introduced to the training inputs to improve the network denoising capability. Three independent experiments were conducted to examine the reproducibility. A lesion was inserted into testing data to evaluate the impact of mismatched MRI information using the contrast-to-noise ratio (CNR). The negative impact on noise reduction was also studied when miscoregistration between PET and MRI images occurs.

Results

Compared with 1U-net trained with only PET images, training with PET/MRI decreased the mean squared error (MSE) by 31.3% and 34.0% for 1U-net and 3U-net, respectively. The MSE reduction is equivalent to an increase in the count level by 2.5 folds and 2.9 folds for 1U-net and 3U-net, respectively. Compared with the MLEM images, the lesion CNR was improved 2.7 folds and 1.4 folds for 1U-net and 3U-net, respectively.

Conclusions

Our proposed method could improve the PET SNR without having higher SNR PET images.

Keywords: PET/MRI, denoise, neural network, deep learning

INTRODUCTION

Deep learning (DL) has been reemerging recently in many fields, including computer vision and speech-recognition, because of big data and groundbreaking GPU performance (LeCun et al 2015, Sze et al 2017). Sophisticated deep neural network (DNN) models were proposed in the competition of ILSVRC (ImageNet Large-Scale Visual Recognition Challenge), such as AlexNet (Krizhevsky et al 2012), VGG Net (Simonyan and Zisserman 2014), Microsoft ResNet (He et al 2015), and GoogLeNet (Szegedy et al 2015). DL is adopted quickly in medical imaging applications for lesion detection (Esteva et al 2017), image segmentation (Ronneberger et al 2015) and registration, and automated diagnosis (Dolz et al 2016). DL has been also used in end-to-end trainings to enhance image quality, such as noise and artifacts reduction, across many medical imaging modalities (Han et al 2016, Kang et al 2016, Gong et al 2017b, Xiang et al 2017, Wu et al 2017, Zhang and Yu 2017, Adler and Oktem 2017, Yang et al 2018, Kim et al 2018, Wang et al 2017). In this work, we proposed a deep neural network adapted from the original U-Net (Ronneberger et al 2015, Han 2017) to predict higher signal-to-noise ratio (SNR) positron emission tomography (PET) images even beyond the training targets.

PET images suffer high noise due to count statistics and ill-poseness of the image reconstruction process. In addition to the widely used maximum likelihood expectation maximization (MLEM) and ordered-subset expectation maximization (OSEM) (Shepp and Vardi 1982, Vardi et al 1985, Hudson and Larkin 1994), various methods have been proposed to address the ill-conditioned issue and to ease the influence of noise on the PET image quality while preserving the boundary information (Hsiao et al 2003, Chlewicki et al 2004, Somayajula et al 2011, Cheng-Liao and Qi 2011, Wang and Qi 2015, Mehranian et al 2012, Nuyts et al 2001). Hybrid PET/magnetic resonance imaging (MRI) has recently drawn attention in both clinical and preclinical studies, especially in neurology (Wehrl et al 2013, Heiss 2016). In contrast to PET, MRI has superior soft-tissue contrast and higher spatial resolution. When multimodality images are available, a prior function derived from either computed tomography (CT) or MRI could be used in a maximum a posteriori (MAP) objective function. The image estimate is regularized by the penalty weights using neighboring pixels in a prior image, which could be derived by non-local means (Bowsher et al 2004, Buades et al 2005), segmentation (Baete et al 2004), joint entropy (Somayajula et al 2005, Nuyts 2007, Tang and Rahmim 2015), sparse representation (Tang et al 2014, Tahaei et al 2016, Wang et al 2016) or kernel-based method (Gong et al 2017). Moreover, joint estimation of PET activity and attenuation was also studied to improve quantitative accuracy using anatomical information from T1-or T2-weighted Brain MRI (Mehranian et al 2017). Although these advanced image reconstruction algorithms could provide better noise performance, they require considerably high computation cost and it is often difficult to find optimal parameter settings.

DL has been used to improve PET image quality (Xiang et al 2017). Most existing DL-based methods require high-dose PET images for training and the network output cannot outperform the training image. Unfortunately, high-dose PET images are not readily available in practice. Although it is possible to train a model without clean data (Lehtinen et al 2018), it requires multiple data sets acquired under the identical condition to prepare training inputs and targets. In this work, our major objective was to improve SNR of PET images using PET/MRI images in a DNN model without requiring high SNR PET images or data from multiple acquisitions during the training. To broaden the utility of our proposed method, we chose to use filtered backprojection (FBP) reconstruction as the training input data, and MLEM reconstruction as the targets in the trainings. As a result, this method also reduces the computation time of image reconstruction. Even though the computation time of current PET image reconstruction is not a critical issue on nowadays workstations, the acceleration could still be helpful to future PET scanners with a longer axial field of view, such as the total body 2-m long EXPLORER scanner (Zhang et al 2017). Furthermore, the reconstruction algorithms used in this work could be replaced with any other advanced algorithms to account for certain desired advantages. It is not our intention to compare the proposed model with other advanced reconstruction algorithms but merely to provide a proposed deep learning framework. To our best knowledge, no studies so far have exploited DNN models combined with noise-amplified training data to improve the PET SNR without requiring high-dose PET images or multiple acquisitions.

Our contributions in this work towards the PET SNR improvement are two folds. First, we proposed a three-U-Nets (3U-net) model and demonstrated that it provides better improvement using PET/MRI images than either using only PET images or PET/MRI images in a one-U-Net (1U-net) model. Second, we showed that adding extra Poisson noise in the training input data could further improve the PET SNR in the inference process without requiring additional data sets in the training whether high-dose PET images or data from multiple acquisitions.

METHODS

I. Brain Phantoms and PET/MRI Training Images

The training data were prepared with 21 digital brain phantoms based on real patients downloaded from the BrainWeb website (BrainWeb n.d., Collins et al 1998, Aubert-Broche et al 2006). The dataset contains the tissue-segmented brain phantoms as well as their T1-weighted MRI images, which were corrected for the intensity non-uniformity. Owing to the limitation of the GPU memory size and the complexity of our proposed model, these phantom images underwent down-sampling from 256 × 256 × 181 to a common PET image dimension, which is 128 × 128 × 181 with the voxel size of 2 × 2 × 1 mm3. To create noise free PET images, the PET activity ratio of the gray matter to white matter in the phantoms was assigned to 4:1 and the other regions in the phantoms were assumed with no PET activity. The noise-free PET images were first forward projected and then Poisson noise was introduced into the sinograms based on a 6-minute static FDG brain PET scan, which gives an average of 100M counts. The attenuation effect was considered in the simulation to account for 8 different tissues in the tissue-segmented phantoms with their attenuation coefficients taken from the NIST table (Hubbell and Seltzer 2004). Attenuation correction was conducted before the image reconstruction.

To create training data, we used an analytic reconstruction algorithm, FBP with the ramp filter, to reconstruct images as the input data and a statistical iterative reconstruction algorithm, MLEM with 50 iterations, to reconstruct images as the targets for the deep neural network training. Since the aim of our work was to explore the ability of deep learning to predict higher SNR (higher count level) PET images without requiring any existing higher SNR PET images, extra Poisson noise was added into the simulated sinograms before the FBP reconstruction, but not to the MLEM reconstruction of the target images. Additionally, we took the advantage of introducing extra Poisson noise nine different times to augment the training data. Note that no extra noise was added during testing. In addition, the T1-weighted MRI images of the phantoms were used together with the corresponding PET images in the training under the assumption of perfect coregistration. Each 2D FBP reconstructed PET image was paired with the corresponding T1-weighted MRI image into two channels of every input data against the MLEM reconstructed PET image as the target for the supervised learning training. One example of the training data is shown in Figure 1.

Figure 1.

Figure 1.

An example of training data at 100M count level. (A) FBP image with no extra Poisson noise, (B) FBP image with extra Poisson noise, (C) MLEM image with 50 iterations, and (D) T1-weighted MRI image. FBP image either with or without extra noise was paired with its MRI image as network inputs against the MLEM image with 50 iterations as the training target.

To study the impact of miscoregistration between PET and MRI image pairs, two offsets, 0.5 mm and 1.0 mm, were separately introduced to the MRI images before the down-sampling in two scenarios. In the first scenario, the MRI images were translated along a randomly selected direction of multiple of 45°. In the second scenario, the MRI images were translated along all directions of multiples of 45°. As a result, the total number of the training data in the first scenario is the same as that of the perfect coregistrated case, whereas eight times more training data in the second scenario and consequently the same as that of the noise augmentation case. Furthermore, an additional offset of 2.0 mm was trained and evaluated in the second scenario. The miscoregistration was applied to all the data for training, validation and testing.

All the PET/MRI images were also normalized respectively by their global maximum pixel values across all training images to constrain the range of the pixel values. Eighteen phantoms were used as the training data, two phantoms were used as validation data and the last one phantom was used as test data.

II. Proposed Neural Network Model

Inspired by the studies of synthesizing CT images from MRI images using deep learning (Han 2017), medical image segmentation (Ronneberger et al 2015), and image restoration (He et al 2015, Mao et al 2016), we adapted the original U-Net model for our PET image denoising purpose. As shown in Figure 2, the whole model was divided into two symmetric processes, i.e., encoding and decoding. Since there were 25 convolutional layers in the U-Net used in our work, this deep network model would become slow convergent due to smaller gradient in the backpropagation during training. The layers with the same input dimension in these two processes were connected by skip connections, which are the element-wise summations of the feature maps from the encoding process and the decoding process after rectified linear unit (ReLU). Max-pooling layers were used to speed up the training process as a result of down-sampling and to ensure the learned features are translation-invariant. We added four more convolutional layers to the bottom of the U-Net model and expected that the model would be more capable of correlating the multi-modality training inputs to the targets. With the skip connections, the missing detail in the down-sampling process by max-pooling layers could be preserved, and the gradients vanishing issue could be relieved (Mao et al 2016). The skip connections in this work connected the second or third convolutional layers at every level in the encoding process to pass denoising relevant features to the decoding process. Unlike concatenating a layer feature map as an extra input to another one in other papers (Ronneberger et al 2015, Han 2017), element-wise summation used here makes the model fit the residual instead of the original output. Consequently, the adapted U-Net model with skip connections could be trained easier and more effectively, even without using batchnorm (BN) layer (Ioffe and Szegedy 2015). We also tested our proposed model with BN layer but found that the training loss fluctuated more over early iterations and validation loss often drifted apart from the training loss (data not shown). Therefore, we decided not to use BN layers in the proposed models.

Figure 2.

Figure 2.

Adapted U-Net model with asymmetric skip connections. Each box represents a feature map with the channel number on the top and image size on the left. The network input and output data are in orange. The max-pooling operations are in red and up-scaling in green. The skip connections between the layers in the encoding and decoding processes are represented in gray dash line. All convolutional layers in the model are followed by a ReLU activation layer except for the final one before the network output.

To extract more information from MRI image, we further proposed a three-U-Nets model (3U-net). The flowcharts of 1U-net and 3U-net are shown in Figure 3. First, the PET and MRI images were trained in two individual 1U-net networks with either PET or MRI image as the input and PET image as the target. Second, the learned weights and biases were used to initialize the first two U-Nets in 3U-net and then the same set of PET and MRI images were used to train the whole three U-Nets with the same set of training data.

Figure 3.

Figure 3.

Flowcharts of 1U-net and 3U-net. The flowchart of 1U-net is on the left and the one of 3U-net on the right. The first row is the inference input data (training input in the parenthesis), middle is the neural network model, and bottom is the inference (training targets, MLEM, in the parenthesis). PET and MRI images are used together in two channels in 1U-net, while they are used in two individual 1U-net and used to initialize the first two 1U-net denoted as 1U PET and 1U MRI in 3U-net. The dimension of PET and MRI images is 128 × 128.

A stochastic gradient decent (SGD) algorithm was used to minimize the loss function in the backpropagation process. The initial learning rate was 10−6 and dropped 20% every 104 iterations. The momentum was 0.9. The total epoch number for 1U-net was 1360, while that was 340 for each U-Net in 3U-net. The batch size for 1U-net was 32 and for each U-Net in 3U-net was 16. All the weights were initialized to be in a reasonable range by Xavier initialization (Glorot and Bengio 2010). To measure for the variability of the training process, three independent runs of the training and testing process were conducted using different noise realizations of the data.

III. Performance Evaluation

One figure of merit is the mean squared error (MSE) defined as MSE=1NN(xjyj)2, where xj and yj are image intensities at voxel j for two different images, and N is the total number of voxels. It is also the loss function used in the network training process. Since the proposed model was designed to increase the SNR of PET images beyond the target images, the noise-free MLEM reconstructed images of the phantoms with 1000 iterations were used as the reference images for the performance evaluation. Using noise-free reconstruction, instead of the original digital phantom image, allows us to focus on the noise in the image and not be affected by the intrinsic resolution of the simulated PET scanner.

Note that the FBP images for all inferences were reconstructed from the simulated sinograms without extra Poisson noise. The overall performance was evaluated by the MSE of all the images from the same phantom between the inferences and reference images. To observe the performance of the models with various training input data, the MSE was compared between 1U-net trained with only PET images, 1U-net trained with PET/MRI images, and 3U-net trained with PET/MRI images. The MSE of 1U-net and 3U-net with and without extra Poisson noise in the training data was calculated to show the benefit of adding extra Poisson noise. To convert the SNR improvement into an equivalent count level increase, we reconstructed a series of noisy sinograms at the count levels of 100M, 200M, 300M and 400M using MLEM with iterations from 1 to 120. The minimum MSE between these reconstructed images and the reference images across all iterations was plotted as a function of the count level and used as a lookup table. The MSE of 1U-net and 3U-net with and without extra Poisson noise between the inferences and reference images was converted to an equivalent count level based on the lookup table. Note that the improvement of the equivalent count level by DNN is beyond what can be obtained by simply over-or under-iterate the MLEM algorithm. To study the effect of the extra noise count level on the SNR improvement, extra Poisson noise at the count level ranging from 0.1 to 105 million was added into the training input data and the MSE of the inferences as a function of the different extra noise count levels was plotted.

The noise reduction performance of the trained models in terms of bias and variance was assessed by processing 100 noisy realizations of each phantom. The bias-square and variance images of the testing phantom were calculated for the comparison between different conditions.

Aside from evaluating the SNR improvement on normal brain images, an artificial lesion with a diameter of 10 mm was inserted into the white matter region of the PET images to assess the contrast-to-noise ratio (CNR) in the inferences. The CNR is defined as a ratio of difference between the mean of the lesion and the mean of the background to the standard deviation of the background. It is also used to assess the impact of mismatched MRI information on PET image, since the lesion only appeared in PET images but not in MRI images. The CNR was calculated from one 3D region-of-interest (ROI) drawn on the lesion and two background ROIs on different locations in the white matter region. All the ROIs were 8 mm in diameter and their locations in the phantom were indicated by yellow circles in Figure 4.

Figure 4.

Figure 4.

An example of a lesion-inserted brain images for the inference. The top row is the testing data at 100M count level and bottom row is the phantom images with ROI locations. (A) FBP image, (B) MLEM image with 50 iterations, (C) T1-weighted MRI image, (D) lesion ROI in transverse view, (E) one of background ROIs in transverse view, and (F) both background ROIs in sagittal view. No extra Poisson noise was added to the testing data. An artificial lesion is inserted into the white matter region in the PET images with the same contrast of gray matter to white matter, i.e., 4:1. The lesion is 10 mm in diameter and not visible in MRI images. The diameter of the ROIs is 8 mm.

IV. Computing Platform and Computational Performance

All the trainings and inferences were conducted using Caffe on a NVIDIA GeForce GTX 1080 Ti GPU. The CUDA library was 8.0 and cuDNN was 5.1. The raw training data were converted to Lightning Memory-Mapped Database Manager (lmdb) format for better I/O efficiency in the trainings using Caffe (Jia et al 2014). The computing speed of a 1U-net training was 2.85 iterations/s, whereas the computing speed of the individual 1U-net training using PET or MRI was 4.55 iterations/s and that of whole 3U-net training was 1.56 iterations/s. In the inference process, the computation time of inferring a pair of 2D PET and MRI images was 10 ms and 20 ms for 1U-net and 3U-net, respectively.

RESULTS

Five representative brain phantoms were selected to show the results in this paper, including two training data, two validation data and one testing data. The inferences of the five representative phantoms using 1U-net and 3U-net with/without extra noise are shown in Figure 5 (training data) and Figure 6 (validation/testing data). The MSEs of the individual inferences from 1U-net and 3U-net with various training data are shown in Figure 7. The MSE results of all the phantoms are provided in Figure S1 in the supplemental materials. It shows that the 1U-net trained with only PET images have the highest MSE among all the network predictions, and followed by 1U-net with PET/MRI and then 3U-net with PET/MRI. Compared with the MSE of 1U-net trained with only PET images, training with PET/MRI could decrease the MSE 15.5% for 1U-net with extra noise and 4.6% for 1U-net without extra noise, whereas 21.3% for 3U-net with extra noise and 10.4% for 3U-net without extra noise. The highest standard deviations of the MSE from the three noise realizations among the testing data were 0.014 (1.78%) and 0.016 (1.83%) for 1U-net with and without extra noise respectively, while 0.018 (2.53%) and 0.012 (1.66%) for 3U-net with and without extra noise respectively. The MSE curves as a function of the count levels are shown in Figure 8 for the five representative phantoms and the predicted count levels are summarized in Table 1. It shows that the trained DNNs, except for 1U-net without extra noise, can improve the equivalent count level up to 13.6% for 1U-net with extra noise, whereas 65.8% and 28.2% for 3U-net with and without extra noise, respectively. The equivalent count level could be further improved by up to 2.5 folds and 2.9 folds for 1U-net and 3U-net, respectively, when the noise augmentation was applied to the training data.

Figure 5.

Figure 5.

The inferences and reference images for two training data. From left to right are, the inferences of 1U-net trained without and with extra noise, 3U-net trained without and with extra noise, and the MLEM reconstructed noise-free PET images with 1000 iterations.

Figure 6.

Figure 6.

The inferences and reference images for the validation and testing data. From left to right are the inferences of 1U-net trained without and with extra noise, 3U-net trained without and with extra noise, and the MLEM reconstructed noise-free PET images with 1000 iterations.

Figure 7.

Figure 7.

Overall MSE of inferences from 1U-net and 3U-net with various training input data for five representative phantoms. The first five clusters are the results of the phantoms without extra noise and the second five are the ones with extra noise and with noise augmentation. The MSE of FBP and MLEM are used as the references to be compared with 1U-net trained with only PET images, 1U-net trained with PET/MRI images and 3U-net trained with PET/MRI images. The error bars indicate the standard deviations derived from three noise realizations for 1U-net and 3U-net trained with PET/MRI images.

Figure 8.

Figure 8.

The predicted count level lookup table of five representative phantoms. The MSE is calculated between the MLEM reconstructed images with various iterations from the real sinograms at the count levels of 100, 200, 300 and 400 million and MLEM reconstructed noise-free PET images with 1000 iterations. The lookup table is used to convert phantom-wise MSE to an predicted count level for evaluating SNR improvement.

TABLE 1.

Predicted count level in millions of the three validation (vali)/test phantoms

1U valit#1 1U vali#2 1U test#1
no extra noise 94.5 88.0 93.8
extra noise 113.6 108.2 110.4
extra noise (noise augmentation) 254.3 233.7 244.7
3U vali#1 3U vali#2 3U test#1
no extra noise 128.2 128.2 127.7
extra noise 165.8 161.6 165.0
extra noise (noise augmentation) 288.2 272.0 283.1

To evaluate the robustness of the models, the previously trained networks were tested on the phantoms with different gray matter (GM) to white matter (WM) ratios and the MSE results are plotted in Figures S2S4. Despite images having different GM-WM ratios, 3U-net with extra noise and augmentation could still reduce the MSE at least 45% from FBP for these ratios, while 42%, 37% and 17% from MLEM for the ratios of 2:1, 3:1 and of 6:1, respectively. When the ratio is greater than the one used in the training, both 1U- and 3U-net are less capable of reducing noise.

Figure 9 shows the prediction MSE curves as a function of extra noise levels added to the training input data for 1U-net and 3U-net. The MSE at 1000 million count extra noise level was close to that without extra noise and the MSE is also very high when the extra noise count level is less than 10 million. It appears that adding the extra noise at the same count level as the original data, i.e.,100 million, is a reasonable choice.

Figure 9.

Figure 9.

The MSE curves at different noise levels for the three validation/testing phantoms. The MSE curves on the left are for 1U-net and on the right for 3U-net.

The inferences of the lesion-inserted testing data are shown in Figure 10. The average CNRs of the lesions with the standard deviations are summarized in Table 2. Compared with the CNR of the MLEM reconstructed images with 50 iterations, it could be improved up to 31.2% and 25.9% for 1U-net and 3U-net trained with extra noise, respectively. When the noise augmentation was applied, the CNR was improved by up to 2.7 folds and 1.4 folds for 1U-net and 3U-net, respectively.

Figure 10.

Figure 10.

The inferences of lesion-inserted testing data and their MLEM reconstructed images with 50 iterations. From left to right are the inferences of 1U-net without and with extra noise, 3U-net without and with extra noise, and the MLEM reconstructed PET images with 50 iterations. The lesion was inserted in the right white matter region as the same contrast as the gray matter. The lesions are indicated by the red arrows in the MLEM reconstructed images.

TABLE 2.

Contrast-to-noise ratio of lesion in PET images

FBP MLEM it50 1U no extra noise 1U extra noise 1U extra noise (noise augmentati on) 3U no extra noise 3U extra noise 3U extra noise (noise augmentati on)
vali#1 7.1 ± 0.82 7.7 ± 0.05 7.0 ± 0.63 9.2 ± 0.46 24.4 ± 0.51 7.9 ± 0.57 8.5 ± 0.40 16.4 ± 0.44
vali#2 7.3 ± 0.02 8.1 ± 0.08 7.6 ± 0.13 10.6 ± 0.13 26.5 ± 3.06 8.5 ± 0.59 10.0 ± 0.61 17.1 ± 1.73
test#1 6.8 ± 0.31 7.2 ± 0.44 7.2 ± 0.05 9.3 ± 0.03 26.8 ± 0.86 7.9 ± 0.17 9.1 ± 0.23 17.0 ± 0.06

The bias-square and variance images of the testing phantom without and with lesion are shown in Figure 11 and Figure 13, respectively. The profiles are given in Figures 12 and 14. It shows that the bias-square images are comparable between different networks and training inputs, and the benefit of the 3U-net with extra noise came from the reduction of variance. Although the variance was further reduced when the noise augmentation was applied, the bias near the lesion increased almost 4 folds compared with that from 1U-and 3U-net trained with extra noise.

Figure 11.

Figure 11.

The bias2 (top) and variance (bottom) images of test#1 without lesion. The bias2 images are displayed on the scale of 0.03 and the variance images on 0.001.

Figure 13.

Figure 13.

The bias2 (top) and variance (bottom) images of test#1 with lesion. The bias2 images are displayed on the scale of 0.03 and the variance images on 0.001.

Figure. 12.

Figure. 12.

The profiles of bias2 (top) and variance (bottom) images of test#1 without lesion. The images are displayed on their own min. and max..

Figure 14.

Figure 14.

The profiles of bias2 (top) and variance (bottom) images of test#1 with lesion. The images are displayed on their own min. and max..

The MSEs of the inferences in the PET/MRI miscoregistration study are shown in Figure 15 for randomly selected offset directions, and in Figure 16 for all offset directions and the noise augmentation. Overall, the proposed method still provides MSE reduction for miscoregistration up to 1 mm in the first scenario and up to 2 mm in the second scenario.

Figure 15.

Figure 15.

Overall MSE of inferences from 1U-net and 3U-net with miscoregistrated training input data selected randomly on the offset directions for five representative phantoms. The MSE results of 1U-net and 3U-net PET/MRI with and without extra noise for the offset sizes of 0.5 mm and 1.0 mm are compared with the references.

Figure 16.

Figure 16.

Overall MSE of inferences from 1U-net and 3U-net with miscoregistred training input data selected on all offset directions and with training input data prepared with noise augmentation (only shown in the right five clusters) for five representative phantoms. The MSE results of 1U-net and 3U-net PET/MRI with and without extra noise for the offset sizes of 0.5 mm, 1.0 mm, and 2.0 mm are compared with the references.

DISCUSSION

Many recent papers proposed to train deep learning models with low-dose images against high-dose images to suppress noise in CT and PET applications. However, the outcome of inferences depends on a large number of training data to ensure the generalization of the training, and the final image quality during the inference is limited by the training targets. In some situations, there might not be such high-dose images available for the training. In consideration of data availability, the question we attempted to answer in this paper is how to maximize the PET SNR improvement without using higher SNR PET images or data from multiple acquisitions as the training targets. Our goal is to design a DNN that can be trained using existing standard PET images and then be used to generate PET images with higher SNR, together with a co-registered anatomical image.

Some prior work also proposed using U-net model for PET denoise (Gong et al 2017b). Instead of being used as a standalone denoise step, a trained U-net model was used to represent PET images in the iterative reconstruction process. It increases not only the computational burden but also complexity of the optimization by combining the deep learning and PET reconstruction objective functions. In contrast, our proposed U-net model is not involved in the reconstruction process and it consequently results in faster processing time. Moreover, the proposed framework is designed to lower the requirement of training data preparation and easy to deploy once the model is trained.

Our objective to predicting higher PET SNR without requiring any other high SNR PET images de facto explains why the training targets could not be used as reference images for the performance evaluation. Although this is a digital brain phantom study, taking directly the phantom images as reference images might be too ideal to account for the effect of the image reconstruction algorithm. Therefore, the noise-free PET images reconstructed using MLEM with 1000 iterations were used to numerically evaluate the inference performance. It allows us to evaluate the performance based on noise without worrying about the partial volume effect caused by the simulated PET system.

In terms of the MSE results, 3U-net without extra noise could already provide some PET SNR improvement. The performance of the PET SNR improvement using 3U-net is further elevated when extra noise at the same count level as the original sinograms is added to the training input data. Adding extra noise into the training input data helps the neural network recognize and remove more noise in the inferences. Additionally, the results have shown that the introduction of extra noise could be exploited to augment training data and improve the performance. Figure 9 suggests that the MSE reduction is not proportional to the amount of the extra noise. Furthermore, too much extra noise could deteriorate the PET SNR because the remaining spatial information of the organ-of-interest would not be sufficient for the neural network model to correlate the training input to the targets. In this work, we only tested adding extra Poisson noise, which is the same type of noise associated with the physics of PET acquisitions, and have not yet studied other types of noise. From the results of the different noise count levels, we found that the count level of the extra Poisson noise should not be lower than 1/10th of the original counts. Otherwise, it could worsen the SNR of the inferences due to the mismatch between noise level in the training inputs and noise level in the inference inputs.

The MSE could be decomposed into bias-square and variance. From Figures 1114, we found that the bias-square changed slightly and locally among the conditions, but the variance reduced more when the extra noise was introduced in the training data for both 1U-net and 3U-net models. In addition, the 3U-net model is able to reduce the variance more than the 1U-net model, especially in the white matter regions. It is because the pixel value distribution in the white matter regions is more homogeneous than in the gray regions in the MRI images (see Figure S7 for the statistics of MRI pixel values), and the 3U-net model could extract more features from the MRI images.

The adapted U-Net model in this work is composed of five max-pooling layers, which are to extract multi-scale features of input images invariant to local fluctuations, and five elementwise-sum skip connections, which are not only to transfer more details into the convolutional layers in the decoding process but also to make backpropagation more efficient and effective as a result of the residual neural network training. In comparison to the original U-Net model proposed by Ronnebereger et al in 2015, we added more neurons in the bottom of the U-Net model to extract more features from the multi-modality inputs. To achieve better SNR improvement, the skip connections are modified from the original paper to connect convolutional layers asymmetrically in the encoding and decoding processes as shown in Figure 2. Compared with the symmetric skip connections (Figure S5), the asymmetric skip connections reduced the MSE by 18.3% for the 1U-net and 13.8% for the 3U-net (Figure S6).

Xiang et al (Xiang et al 2017) showed that combination of PET and T1-weighted MRI images could improve PET image quality because the anatomical information from MRI images could help restore the deteriorated boundaries in low-dose PET images. Our MSE results also show that using only PET images in the training is not sufficient to improve SNR better than that of MLEM reconstructed images. We further demonstrated that even though PET/MRI images are used in 1U-net without extra noise in the training input data, the MSE is still no better than the one of MLEM reconstructed images. In other words, given no extra noise present in the training inputs, 1U-net is not capable of extracting enough information from MRI to improve the PET SNR beyond the training target image. It can be achieved only when our proposed model, 3U-net, is used together with PET/MRI images in the training. The results of the equivalent count level in Table 1 provide an easier way to compare the performance between the conditions. The superiority of 3U-net could be because more information can be extracted when PET and MRI undergo their own U-Nets without interference from each other and the weights and biases in the first two U-Nets in 3U-net were initialized by the outputs from these two individual U-Nets trained with PET and MRI images. Therefore, 3U-net could extract more features from the PET and MRI images than training the PET/MRI images together in only one U-Net. The inferences of 3U-net in Figure 6 show better noise suppression than 1U-net and slight less blurry when trained with extra noise than without.

In addition to the perfect coregistered PET/MRI images for the training, the noise reduction capability of the proposed 3U-net was evaluated with the miscoregistered PET/MRI training data. Figure 15 shows that the offset between PET and MRI images is indeed an obstacle to the SNR enhancement, although 3U-net could still reduce the MSE compared to that of MLEM when the offset is no greater than 1.0 mm. The performance of the noise reduction could be restored, as shown in Figure 16, if the training data are augmented by translating the MRI images in different directions around the original position. It is an alternative to mitigate the negative impact of the miscoregistration between PET and MRI images in practice on the SNR enhancement using 3U-net. However, neither 1U-net nor 3U-net could achieve a fair amount of improvement when the miscoregistration is greater than 1.0 mm.

Since the visibility of a lesion is important in medical imaging, we studied whether a lesion only appearing in PET images could be preserved in the inferences and whether its CNR could be improved as well when the neural network model is not trained with such images. The lesion in the inferences in Figure 10 did not disappear and the CNR of the lesion is even better than the one of the MLEM reconstructed images with 50 iterations. The better CNR of lesion is mainly attributed to the much smoother background, while the contrast of the lesion is also decreased. The lesion contrast is the lowest when the noise augmentation was applied during the training. One possible explanation is that the MRI images match the training targets perfectly in the train data and the feature of MRI images is weighted more heavily when more noisy PET input images are used.

Although our proposed model does not require relative high-dose PET images for training, some limitations should still bear in mind. Without relative high-dose PET images means that some missing information in both training inputs and targets cannot be recovered. The SNR improvement attributed to adding extra Poisson noise in the training inputs might be compromised if either one of the conditions occurs: (1) the noise type in the data is not Poisson distribution and (2) too much noise is added that deteriorates the mean and structural information of the images. The trained model needs to be retrained and/or fine-tuned if the condition of the training data is not consistent with the inferring data, for instance, the data from different PET scanners and/or tracers, MRI sequences, body section/organ and disease.

The proposed model is designed for easy implementation and deployment to the existing data. To apply the proposed model for clinical practice, the existing PET data need to be reconstructed again with additional Poisson noise in the sinograms and paired with the existing reconstructions without additional Poisson noise for training the model. If pre-trained model with either simulated data or real data is available, transfer learning could be conducted when the condition of the training data changes, such as different tracers or scanners.

CONCLUSION

We have shown that our proposed DNN framework could improve PET SNR without requiring higher SNR PET images. The combination of our proposed DNN model and coregistered MRI images could improve the PET SNR beyond the training targets when compared to noise-free reconstructions. In future work, we will investigate the SNR improvement using this framework with real patient data and the sensitivity of 3U-net to the influence of misconfiguration, such as misalignment between PET and MRI images.

Supplementary Material

supplementary material

ACKNOWLEDGMENT

This work is supported by the National Institute of Biomedical Imaging and Bioengineering (NIBIB) under grant no. R01EB000194.

References

  1. Adler J and Oktem O 2017. Learned Primal-dual Reconstruction arXiv:1707.06474 [DOI] [PubMed] [Google Scholar]
  2. Aubert-Broche B, Griffin M, Pike GB, Evans AC and Collins DL 2006. Twenty new digital brain phantoms for creation of validation image data bases IEEE Trans. Med. Imaging 25 1410–6 [DOI] [PubMed] [Google Scholar]
  3. Baete K, Nuyts J, Van Laere K, Van Paesschen W, Ceyssens S, De Ceuninck L, Gheysens O, Kelles A, Van Den Eynden J, Suetens P and Dupont P 2004. Evaluation of anatomy based reconstruction for partial volume correction in brain FDG-PET Neuroimage 23 305–17 [DOI] [PubMed] [Google Scholar]
  4. Bowsher JE, Hong Yuan, Hedlund LW, Turkington TG, Akabani G, Badea A, Kurylo WC, Wheeler CT, Cofer GP, Dewhirst MW and Johnson GA 2004. Utilizing MRI Information to Estimate F18-FDG Distributions in Rat Flank Tumors IEEE Symp. Conf. Rec. Nucl. Sci. 2004 4 2488–92 [Google Scholar]
  5. BrainWeb Simulated brain database Online: http://www.bic.mni.mcgill.ca/brainweb/
  6. Buades A, Coll B and Morel J-MJ-M 2005. A non-local algorithm for image denoising Comput. Vis. Pattern Recognition, 2005. CVPR 2005. IEEE Comput. Soc. Conf 2 60–5 vol. 2 [Google Scholar]
  7. Cheng-Liao J and Qi J 2011. PET image reconstruction with anatomical edge guided level set prior Phys. Med. Biol 56 6899–918 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Chlewicki W, Hermansen F and Hansen SB 2004. Noise reduction and convergence of Bayesian algorithms with blobs based on the Huber function and median root prior Phys. Med. Biol 49 4717–30 [DOI] [PubMed] [Google Scholar]
  9. Collins DL, Zijdenbos AP, Kollokian V, Sled JG, Kabani NJ, Holmes CJ and Evans AC 1998. Design and Construction of a Realistic Digital Brain Phantom IEEE Trans. Med. Imaging 17 463–8 [DOI] [PubMed] [Google Scholar]
  10. Dolz J, Betrouni N, Quidet M, Kharroubi D, Leroy HA, Reyns N, Massoptier L and Vermandel M 2016. Stacking denoising auto-encoders in a deep network to segment the brainstem on MRI in brain cancer patients: A clinical study Comput. Med. Imaging Graph 52 8–18 [DOI] [PubMed] [Google Scholar]
  11. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM and Thrun S 2017. Dermatologist-level classification of skin cancer with deep neural networks Nature 542 115–8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Glorot X and Bengio Y 2010. Understanding the difficulty of training deep feedforward neural networks Proc. 13th Int. Conf. Artif. Intell. Stat 9 249–56 [Google Scholar]
  13. Gong K, Cheng-Liao J, Wang G, Chen KT, Catana C and Qi J 2017a. Direct Patlak Reconstruction from Dynamic PET Data Using the Kernel Method with MRI Information Based on Structural Similarity IEEE Trans. Med. Imaging 0062 1–11 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Gong K, Guan J, Kim K, Zhang X, Fakhri G El, Qi J and Li Q 2017b. Iterative PET Image Reconstruction Using Convolutional Neural Network Representation arXiv:1710.03344 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Han X 2017. MR-based Synthetic CT Generation using a Deep Convolutional Neural Network Method Med. Phys 44 1408–19 [DOI] [PubMed] [Google Scholar]
  16. Han Y, Yoo J and Ye JC 2016. Deep Residual Learning for Compressed Sensing CT Reconstruction via Persistent Homology Analysis arXiv:1611.06391 [Google Scholar]
  17. He K, Zhang X, Ren S and Sun J 2015. Deep Residual Learning for Image Recognition arXiv:1512.03385v1 [Google Scholar]
  18. Heiss W-D 2016. Hybrid PET/MR Imaging in Neurology: Present Applications and Prospects for the Future J. Nucl. Med 57 993–5 [DOI] [PubMed] [Google Scholar]
  19. Hsiao I-T, Rangarajan A and Gindi G 2003. A new convex edge-preserving median prior with applications to tomography. IEEE Trans. Med. Imaging 22 580–5 [DOI] [PubMed] [Google Scholar]
  20. Hubbell JH and Seltzer S. 2004. Tables of X-Ray Mass Attenuation Coefficients and Mass Energy-Absorption Coefficients (version 1.4) Natl. Inst. Stand. Technol [Google Scholar]
  21. Hudson HM and Larkin RS 1994. Accelerated image reconstruction using ordered subsets of projection data IEEE Trans. Med. Imaging 13 601–9 [DOI] [PubMed] [Google Scholar]
  22. Ioffe S and Szegedy C 2015. Batch Normalization : Accelerating Deep Network Training by Reducing Internal Covariate Shift arXiv:1502.03167v3 [Google Scholar]
  23. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S and Darrell T 2014. Caffe: Convolutional Architecture for Fast Feature Embedding arXiv Prepr. arXiv1408.5093 [Google Scholar]
  24. Kang E, Min J and Ye JC 2016. A deep convolutional neural network using directional wavelets for low-dose X-ray CT reconstruction 44 360–75 [DOI] [PubMed] [Google Scholar]
  25. Kim K, Wu D, Gong K, Dutta J, Kim JH, Son YD, Kim HK, Fakhri G El and Li Q 2018. Penalized PET reconstruction using deep learning prior and local linear fitting 0062 1–10 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Krizhevsky A, Sutskever I and Hinton GE 2012. ImageNet Classification with Deep Convolutional Neural Networks Adv. Neural Inf. Process. Syst 1097–105 [Google Scholar]
  27. LeCun Y, Bengio Y and Hinton G 2015. Deep learning Nature 521 436–44 [DOI] [PubMed] [Google Scholar]
  28. Lehtinen J, Munkberg J, Hasselgren J, Laine S, Karras T, Aittala M and Aila T 2018. Noise2Noise: Learning Image Restoration without Clean Data arXiv:1803.04189v1 [Google Scholar]
  29. Mao X, Shen C and Yang Y 2016. Image Restoration Using Convolutional Auto-encoders with Symmetric Skip Connections arXiv:1606.08921v3 [Google Scholar]
  30. Mehranian A, Rahmim A, Ay MR, Kotasidis FA and Zaidi H 2012. An ordered-subsets proximal preconditioned gradient algorithm for total variation regularized PET image reconstruction IEEE Nucl. Sci. Symp. Conf. Rec 40 3375–82 [DOI] [PubMed] [Google Scholar]
  31. Mehranian A, Zaidi H and Reader AJ 2017. NeuroImage MR-guided joint reconstruction of activity and attenuation in brain PET-MR Neuroimage 162 276–88 [DOI] [PubMed] [Google Scholar]
  32. Nuyts J 2007. The use of mutual information and joint entropy for anatomical priors in emission tomography IEEE Nucl. Sci. Symp. Conf. Rec 6 4149–54 [Google Scholar]
  33. Nuyts J, Michel C and Dupont P 2001. Maximum-likelihood expectation-maximization reconstruction of sinograms with arbitrary noise distribution using NEC-transformations. IEEE Trans. Med. Imaging 20 365–75 [DOI] [PubMed] [Google Scholar]
  34. Ronneberger O, Fischer P and Brox T 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation arXiv:1505.04597v1 [Google Scholar]
  35. Shepp LA and Vardi Y 1982. Maximum likelihood reconstruction for emission tomography IEEE Trans. Med. Imaging 1 113–22 [DOI] [PubMed] [Google Scholar]
  36. Simonyan K and Zisserman A 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition arXiv:1409.1556 [Google Scholar]
  37. Somayajula S, Asma E and Leahy RM 2005. PET Image Reconstruction using Anatomical Information through Mutual Information Based Priors Nuclear Science Symposium Conference Record, 2005 IEEE. vol 5 [Google Scholar]
  38. Somayajula S, Panagiotou C, Rangarajan A, Li Q, Arridge SR and Leahy RM 2011. PET image reconstruction using information theoretic anatomical priors IEEE Trans. Med. Imaging 30 537–49 [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Sze V, Chen Y-H, Yang T-J and Emer J 2017. Efficient Processing of Deep Neural Networks: A Tutorial and Survey arXiv:1703.09039v2 [Google Scholar]
  40. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A, Hill C and Arbor A 2015. Going Deeper with Convolutions Proc. IEEE Conf. Comput. Vis. pattern Recognit 1–9 [Google Scholar]
  41. Tahaei MS, Reader AJ and Collins DL 2016. MR-Guided PET Image Denoising Tahaei Marzieh S., Andrew J. Reader, and Collins D. Louis. “MR-guided PET image denoising.” Nuclear Science Symposium, Medical Imaging Conference and Room-Temperature Semiconductor Detector Workshop (NSS/MIC/RTSD) pp 1–3 [Google Scholar]
  42. Tang J and Rahmim A 2015. Anatomy assisted PET image reconstruction incorporating multi-resolution joint entropy Anatomy assisted PET image reconstruction incorporating multi-resolution joint entropy Phys. Med. Biol 60 31–48 [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Tang J, Wang Y, Yao R and Ying L 2014. Sparsity-Based PET Image Reconstruction Using MRI Learned Dictionaries 1087–90 [Google Scholar]
  44. Vardi Y, Shepp L and Kaufman L 1985. A statistical model for positron emission tomography J. Am. Stat. Assoc 80 8–20 [Google Scholar]
  45. Wang G and Qi J 2015. PET Image Reconstruction Using Kernel Method IEEE Trans. Med. Imaging 34 61–71 [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Wang Y, Ma G, An L, Shi F, Zhang P, Lalush DS, Wu X, Pu Y, Zhou J and Shen D 2017. Semisupervised Tripled Dictionary Learning for Standard-Dose PET Image Prediction Using Low-Dose PET and Multimodal MRI IEEE Trans. Biomed. Eng 64 569–79 [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Wang Y, Zhang P, An L, Ma G, Kang J, Shi F, Wu X, Zhou J, Lalush DS, Lin W and Shen D 2016. Predicting standard-dose PET image from low-dose PET and multimodal MR images using mapping-based sparse representation Phys. Med. Biol 61 791–812 [DOI] [PubMed] [Google Scholar]
  48. Wehrl HF, Hossain M, Lankes K, Liu C-C, Bezrukov I, Martirosian P, Schick F, Reischl G and Pichler BJ 2013. Simultaneous PET-MRI reveals brain function in activated and resting state on metabolic, hemodynamic and multiple temporal scales Nat. Med 19 1184–9 [DOI] [PubMed] [Google Scholar]
  49. Wu D, Kim K, Fakhri G El and Li Q 2017. Iterative Low-dose CT Reconstruction with Priors Trained by Neural Network IEEE Trans. Med. Imaging 36 2479–86 [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Xiang L, Qiao Y, Nie D, An L, Lin W, Wang Q and Shen D 2017. Deep auto-context convolutional neural networks for standard-dose PET image estimation from low-dose PET/MRI Neurocomputing 267 406–16 [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Yang B, Ying L and Tang J 2018. Artificial Neural Network Enhanced Bayesian PET Image Reconstruction IEEE Trans. Med. Imaging 37 1297–309 [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Zhang X, Zhou J and Cherry SR 2017. Quantitative image reconstruction for total-body PET imaging using the 2-meter long EXPLORER Quantitative image reconstruction for total-body PET imaging using the 2-meter long EXPLORER scanner Phys Med Biol 62 2465–2485 [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Zhang Y and Yu H 2017. Convolutional Neural Network Based Metal Artifact Reduction in X-ray Computed Tomography arXiv:1709.01581 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplementary material

RESOURCES