Abstract
Positron Emission Tomography (PET) is a functional imaging modality widely used in neuroscience studies. To obtain meaningful quantitative results from PET images, attenuation correction is necessary during image reconstruction. For PET/MR hybrid systems, PET attenuation is challenging as Magnetic Resonance (MR) images do not reflect attenuation coefficients directly. To address this issue, we present deep neural network methods to derive the continuous attenuation coefficients for brain PET imaging from MR images. With only Dixon MR images as the network input, the existing U-net structure was adopted and analysis using forty patient data sets shows it is superior than other Dixon based methods. When both Dixon and zero echo time (ZTE) images are available, we have proposed a modified U-net structure, named GroupU-net, to efficiently make use of both Dixon and ZTE information through group convolution modules when the network goes deeper. Quantitative analysis based on fourteen real patient data sets demonstrates that both network approaches can perform better than the standard methods, and the proposed network structure can further reduce the PET quantification error compared to the U-net structure.
1. Introduction
Positron Emission Tomography (PET) can produce three dimensional images of biochemical processes in the human body by using specific radioactive tracers. It has wide applications in neuroscience studies, such as measurement of metabolism for brain tumor imaging, dopamine neurotransmitter imaging related to addiction, β-amyloid and tau imaging in Alzheimer’s disease, translocator protein (TSPO) imaging related to microglial activation and so on. Due to various physical degradation factors, correction items, such as randoms, scatters, normalization and attenuation correction, should be included in the reconstruction process to obtain meaningful quantitative results. For attenuation correction, information from computed tomography (CT) has been treated as a reference standard to reflect the attenuation coefficients in 511 Kev after a bilinear scaling (Kinahan et al 2003).
Recently, PET/MR systems begin to be adopted in clinics due to MR’s excellent soft tissue contrast and the ability to perform functional imaging. In addition, simultaneously acquired MR images can provide useful information for PET motion compensation (Catana et al 2011) and partial volume correction (Gong et al 2017c). One concern is that the MR signal is not directly reflective of attenuation coefficients, and hard to be used for attenuation correction without approximation. Many methods have been proposed to generate the attenuation map based on T1-weighted, Dixon, ultra-short echo time (UTE) or zero echo time (ZTE) MR images, which can majorly be summarized into four categories. The first category is segmentation based methods. The MR image is segmented into different tissue classes with the corresponding attenuation coefficients assigned to produce the attenuation map (Martinez-Möller et al 2009, Keereman et al 2010, Berker et al 2012, Ladefoged et al 2015, Sekine et al 2016, Leynes et al 2017, Khalifé et al 2017, Yang et al 2017). Another category relies on the atlas generated from prior patients’ CT and MR pairs. Pseudo CT will be created by non-rigidly registering the atlas to patient MR images (Wollenweber et al 2013, Burgos et al 2014, Izquierdo-Garcia et al 2014, Yang et al 2017a). With the availability of time-of-flight (TOF) information, emission based methods have been developed to estimate the activity image and the attenuation map simultaneously without the use of MR information (Defrise et al 2012, Rezaei et al 2012, Li et al 2017), or aided by MR information (Mehranian and Zaidi 2015, Kim et al 2016, Mehranian et al 2017). Finally, there are efforts adopting machine learning based approaches to pseudo CT generation driven by prior MR and CT pairs, such as the random forest (Huynh et al 2016) and neural network methods(Han 2017, Nie et al 2017, Liu et al 2017, Leynes et al 2017b).
Over the past several years, deep neural networks have been widely and successfully applied to computer vision tasks because of the availability of large data sets, advances in optimization algorithms and emerging of effective network structures. Recently, it has been applied to medical imaging, such as image denoising (Wang et al 2016, Kang et al 2016, Chen et al 2017), image reconstruction (Wu et al 2017a, Gong et al 2017b) and end-to-end lesion detection (Wu et al 2017b). Several pioneering works have shown that neural networks can be employed to generate the pseudo CT images from T1-weighted MR images for the brain region, with evaluations on the pseudo CT image quality only (Han 2017, Nie et al 2017). Take one step further, Liu et al (2017) used convolutional auto-encoder (CAE) to generate the CT tissue labels (air, bone, and soft tissue) from T1-weighted MR images and evaluated its performance for PET images. In that work additional CT segmentation is needed and the attenuation coefficients were assigned based on tissue labels, which are not continuous. Recently Leynes et al (2017b) combined ZTE and Dixon images to generate the pseudo CT for the pelvis region using the U-net structure (Ronneberger et al 2015).
In this work, we focus on using neural network based methods to predict the continuous attenuation map specifically for brain PET imaging under two scenarios:
When there are only Dixon MR images available, we adopted the U-net structure (Ronneberger et al 2015) to generate the pseudo CT images. Forty patients’ data sets were used in the experiment and cross-validated to evaluate the performance. The segmentation and atlas methods based on Dixon MR images provided by the vendor were used as comparison methods;
When both Dixon and ZTE MR images are available, we proposed a new network structure based on group convolution modules to more efficiently combine ZTE and Dixon information. Fourteen patient data sets with both Dixon and ZTE images were employed in the experiments. The ZTE segmentation method provided by the vendor was adopted as the comparison methods.
The main contributions of this paper include (1) using deep neural networks to generate continuous attenuation maps for brain PET imaging;(2) proposing a new network structure to generate the attenuation maps utilizing multiple MR inputs; (3) a comprehensive quantitative comparison with the standard methods.
2. Method
2.1. PET attenuation model
For PET image reconstruction, the measured sinogram data y ∈ ℝM×1 can be modeled as a collection of independent Poisson random variables and its mean is related to the unknown image x ∈ ℝN×1 through an affine transform
| (1) |
where P ∈ ℝM×N is the detection probability matrix, s ∈ ℝM×1 is the expectation of scattered events, and r ∈ ℝM×1 denotes the expectation of random coincidences. M is the number of lines of response (LOR) and N is the number of pixels in image space. The reconstructed image quality strongly depends on the accuracy of the detection probability matrix P, which can be decomposed to (Qi et al 1998)
| (2) |
where G ∈ ℝM×N is the geometric projection matrix whose element gi,j denotes the probability of a photon pair produced in voxel j reaching the front faces of detector pair i, R ∈ ℝN×N models the image domain blurring effects, B ∈ ℝM×M is the sinogram domain blurring matrix (Gong et al 2017a), diagonal matrix N ∈ ℝM×M contains the normalization effects, and diagonal matrix A ∈ ℝM×M models the attenuation factors. The ith diagonal element of attenuation matrix A is calculated as
| (3) |
where is the attenuation map, lij denotes the interaction length of LOR i with voxel j. In PET/CT, CT images are used for the attenuation map generation by the bilinear scaling method (Carney et al 2006)
| (4) |
Here HUj represents the HU units in CT voxel j. a, b and Threshold are values depending on the energy of the CT and are given in Carney et al (2006).
2.2. Pseudo CT generation using deep neural network
The basic module of a convolutional neural network includes a convolution layer and an activation layer. The input and output relationship of the ith module can be denoted as
| (5) |
where yi−1 ∈ ℝN×N×C is the module input with spatial size N × N and channel size C, yi ∈ ℝN×N×H denotes the module output with spatial size N × N and H channels, wi ∈ ℝM × M × C × H is the convolutional filter with kernel width M, b ∈ ℝ1×H is the bias term, ⊛ stands for the convolution operation, and g represents the non-linear activation function. The rectified linear unit (ReLU) activation function, defined as g(x) = max(x, 0), is employed as the activation function. To stabilize and accelerate the deep network training, batch normalization (Ioffe and Szegedy 2015) is often added after the convolution operation. After stacking L units together, the network output can be calculated as
| (6) |
In this work, MR images are treated as the network input and pseudo CT images are output of the network. The network is trained based on prior acquired MR and CT pairs from different patients, with the objective function
| (7) |
which is the L1 norm of the difference between the ground truth CT image CTtrue and the output from the neural network yout(MR). We have also tried L2 norm and found that L1 norm can produce less blurred structures.
2.2.1. Single input
In many cases, only one MR sequence is available for attenuation correction, either T1-weighted, Dixon or UTE/ZTE. The network implemented for this scenario is based on the U-net structure (Ronneberger et al 2015). The overall network architecture is summarized in Fig. 1. It consists of repetitive applications of 1) 3×3 convolutional layer, 2) batch normalization layer, 3) ReLU layer, 4) convolutional layer with stride 2 for down-sampling, 5) transposed convolutional layer with stride 2 for up-sampling, and 6) mapping layer that concatenates the left-side features to the right-side. The input has nine channels with a spatial size of 144 × 144 and the bottom layer has an spatial size of 9 × 9. The number of features N after the first convolution module is 16. To make full use of the axial information, nine neighboring axial slices were stacked occupying nine input channels to reduce the axial aliasing artifacts. As only Dixon images are utilized as single input, this method is referred as Dixon-Unet.
Figure 1.

The schematic diagram of the U-net architecture. Numbers on top of the module stand for the number of features in the channel. Numbers on the left size of the module indicate the spatial input size. N is the number of features after the first convolution module. For the proposed GroupU-net structure, the convolution module inside the dashed box will be replaced by the group convolution module as indicated in Fig. 2. The group module will only be used when the block input has features ≥ 4N. The number of groups in the group convolution module is set to be N.
2.2.2. Multiple inputs
For current PET/MR scanners, more than one MR sequence can be acquired for attenuation correction. For example, both Dixon and ZTE MR images are available in GE SIGNA scanner. When multiple MR images are included as network input, the number of features N after the first convolution module should be enlarged to digest the additional spatial information. For the U-net structure, the number of trainable parameters increases quadratically with N, and overfitting can be a serious pitfall when increasing the network complexity while not providing enough training pairs. It is shown in previous studies that designing a “wider” network can make more efficient use of model parameters (Szegedy et al 2016, Chollet 2016, Xie et al 2017). To preserve the network capacity while restricting the network complexity, the group convolution module as illustrated in Fig. 2 was adopted to replace the convolution module when the network goes deeper. The group convolution module is similar to the module presented in ResNeXt network structure (Xie et al 2017). Traditionally the convolution kernel considers cross-channel correlations and spatial correlations together. The group convolutional module presented in Fig. 2 first deals with the cross-channel correlation through 1×1 convolution and then handles the spatial correlation in smaller groups. The hypothesis is that when the network goes deeper, the spatial content and the cross-channel correlations can be decoupled (Chollet 2016). In our implementation, N is set to be 19 for the U-net with both Dixon and ZTE as input. For GroupU-net, the number of groups is set to be N and we only use the group convolution module when the input channel size is ≥ 4N. N is set to 32 to match with the number of trainable parameters in U-net (2.7 million). These two methods are labeled as DixonZTE-Unet and DixonZTE-GroupUnet, respectively.
Figure 2.

The schematic diagram of the group convolution module. The ReLU and Batch normalization layers are added after each convolution operation during implementation.
3. Experimental evaluations
3.1. Data sets
The patient study was approved by the Institutional Review Board and all patients signed an informed consent before the examinations. In total forty patients acquired from 2014 to 2016 were used in this study. All patients had whole-body PET/CT, followed by additional PET/MRI scanning without second tracer administration. For both PET/CT and PET/MR, only data acquired in the bed position that includes the head are used in the study. No pathology in the brain was reported for any of the patients. The average patient weight was 73.2241 ± 17.0 kg (range, 39.5-109.8 kg). For PET/MRI, the average scan duration of the whole brain was 224.6 ± 133.7 s (range, 135-900 s). All forty patient data sets have Dixon MR images and fourteen patient data sets with additional acquired ZTE MR images. Thirty seven of the total forty patient data sets had FDG scans. The average administered dose of FDG was 305.2 ± 73.9 MBq (range, 170.2-468.1 MBq). Twelve of the fourteen patients with additional ZTE scans had FDG PET scans.
PET/CT examinations were performed in the GE Discovery PET/CT scanner or the Siemens Biograph HiRez 16 PET/CT scanner. For CT images acquired from the GE Discovery PET/CT scanner, the reconstruction has a axial field of view (FOV) of 700 mm and the matrix size is 512 × 512 with voxel size 2.73 × 2.73 × 3.75mm3. For CT images acquired from the Siemens Biograph HiRez 16 PET/CT system, the reconstruction has a axial FOV of 500 mm and the matrix size is 512 × 512 with voxel size 1.95 × 1.95 × 5.00mm3. PET/MR examinations were performed in the GE SIGNA PET/MR system (Grant et al 2016). The transaxial and axial FOV of the PET/MR system is 600 mm and 250 mm, respectively. The crystal size is 4.0 × 5.3 × 25 mm3. PET images were reconstructed using the ordered subset expectation maximization (OSEM) algorithm with TOF information. The point spread function (PSF) (Alessio et al 2010) was also included to improve the image quality. Two iterations with sixteen subsets were run. The voxel size is 1×1×2.87 mm3 and the image size is 300×300×89. Dixon MR images were acquired using the head and neck coil array (repetition time, ~4 ms; first echo time/second echo time, 1.3/2.6 ms; flip angle, 5°; acquisition time, 18 s) and the image size is 256 × 256 × 120 with voxel size 1.93 × 1.93 × 2.6 mm3. ZTE images were acquired using the same head and neck coil array (repetition time, ~0.7 ms; echo time, 0 ms; flip angle, 0.6°; transmit/receive switching delay, 28 ms; readout duration, 440 ms; acquisition time, 41 s) and the reconstructed image size is 110 × 110 × 110 with voxel size 2.4 × 2.4 × 2.4 mm3 (Yang et al 2017).
3.2. Implementation details
When preparing the training pairs, we first registered CT images and ZTE images (if applicable) to the Dixon MR images through rigid transformation using the ANTs software (Avants et al 2009). Then random rotation and permutation was performed on the training pairs to avoid over-fitting. Fig. 3 shows some of the example pairs from different patient data sets used in the training phase for the multiple input scenario. When using only Dixon images as the input, in order to make full use of all the data sets in both the training and testing periods, the forty patient data sets were randomly separated into five groups. For each group, the whole eight data sets were used for testing and the remaining thirty two from other groups were employed in training. Among the forty patients, there are fourteen patients with additional ZTE scans. When using both Dixon and ZTE images as inputs, the fourteen patient data sets were randomly separated into seven groups. For each group the network was trained using the data sets from other groups.
Figure 3.

Examples of the training pairs used in the network training. Top row is the CT label image, middle and bottom rows are the corresponding Dixon MR images (middle) and ZTE MR images (bottom).
The network structures were implemented in TensorFlow using Adam algorithm as the optimizer (Kingma and Ba 2014).The learning rate and the decay rates used are the default settings in Tensorflow. For the single input case, the batch size was set to 60 and for the multiple input case, the batch size was set to 30. 1000 epochs were run for both cases as the training cost function becomes steady after 1000 epochs.
3.3. References methods
From the Dixon MR image, water and fat tissues were segmented and corresponding attenuation coefficients were assigned to generate the attenuation map. This method is labelled as Dixon-Seg. Alternatively, the patient MRI image can be registered to the MR template enabled by prior patients’ MR and CT pairs through non-rigid registration. Air, soft tissue, sinus and bone exist in the generated CT image. This method is named as Dixon-Atlas (Wollenweber et al 2013). For the segmentation method using ZTE images, the ZTE images were first N4 bias corrected (Tustison et al 2010) and then normalized by the median tissue value. Thresholding was performed to segment the images into air, soft tissue and bone regions. This method is labeled as ZTE-Seg. All of these three methods are available in the PET reconstruction tool box provided by the vendor.
3.4. Evaluation metrics
The predicted pseudo CT image quality was evaluated using the relative validation loss, defined as
| (8) |
where CTpseudo is the generated CT using different methods, and CTtrue denotes the ground-truth CT. Bone regions were also quantified using the Dice index, defined as
| (9) |
Regions with attenuation coefficient higher than 0.1083 cm−1 (200 HU unit) were classified as the bone area. For PET image quantification, the relative PET error was used which is defined as
| (10) |
where PETpseudoCT is the PET image reconstructed using the pseudo CT, and PETCT is the PET image reconstructed using the ground-truth CT. The reason we use absolute value here is to ensure the total error will not vanish when summing up the voxel errors inside a region. As it is hard to visualize the error for all pixels, we calculated the relative PET error inside specific regions using the corresponding predefined masks.
3.4.1. Global quantification
We performed a global brain quantification using the brain mask from MNI-ICBM 152 nonlinear 2009 version (Fonov et al 2009). The Dixon image of each patient was first registered to the MNI template. Then the MNI template was back-warped to the Dixon image space. Besides, a mask is defined to include the pixels whose intensity is larger than 30 percent of the max PET intensity (Ladefoged et al 2017). The final global brain mask is defined as the intersection of these two masks. Besides, the histograms of the error image inside the global brain mask, defined as PETpseudoCT – PETCT, were calculated to compare the global performance regarding the bias and standard deviation.
3.4.2. Regional quantification
Apart from the whole brain quantification, we are also interested in the regional brain quantifications as they each play crucial roles in specific neuroscience studies. The automated anatomical labeling (AAL) template (Holmes et al 1998) was back-warped to the PET image space and defined the regions. Four cortex lobes as well as the inner deep regions were used in the quantification. The mean and standard deviation of the relative PET error across all patients for each of the methods were calculated for all the regions and the whole brain.
4. Results
4.1. Using Dixon MR images as input
We first performed a comparison of the proposed Dixon-Unet method with the Dixon-Seg and Dixon-Atlas methods using all data sets. Fig. 4 shows three orthogonal views of the ground truth CT images and the generated pseudo CT images using different Dixon-based methods for one patient. Compared with the atlas method, the CT image produced by the proposed Dixon-Unet method has better bone and sinus structures. The Dixon-Seg method only shows the water and fat tissues. Table. 1 presents the quantitative comparison of the predicted CT images using relative validation loss and the Dice index. Clearly the Dixon-Unet method has the smallest validation loss and the highest Dice index in the bone region. Fig. 5 presents the PET reconstruction error images using the attenuation map produced from the pseudo CT images shown in Fig. 4. Evidently the Dixon-Seg method has the largest error, especially near the bone and air-cavity regions. The Dixon-Atlas method produces smaller errors compared with the Dixon-Seg method, but still has large errors near the bone and the air cavity. Compared with these two methods, Dixon-Unet method shows smaller errors for the whole brain.
Figure 4.

Three views of the attenuation maps (unit, cm−1) derived from the true CT image (first column) and the generated pseudo CT images using the Dixon-Seg method (second column), Dixon-atlas method (third column) and the proposed Dixon-Unet method (last column).
Table 1.
Comparisons of the generated pseudo-CT images when only Dixon images are available (based on 40 patient data sets). The Dice index of bone regions was computed for the whole brain, regions above and below the eyes.
| Methods | Relative validation loss (%) | Dice of bone whole | Dice of bone above eye | Dice of bone below eye |
|---|---|---|---|---|
| Dixon-Seg | 32.70 ± 5.38 | – | – | – |
| Dixon-Atlas | 22.86 ± 2.34 | 0.52 ± 0.05 | 0.61 ± 0.06 | 0.30 ± 0.05 |
| Dixon-Unet | 13.84 ± 1.43 | 0.76 ± 0.04 | 0.82 ± 0.04 | 0.63 ± 0.06 |
Figure 5.

Three views of the PET reconstruction error images (PETpseudoCT – PETCT, unit: SUV) using the Dixon-Seg method (left column), the Dixon-atlas method (middle column) and the proposed Dixon-Unet method (right column).
To quantitatively characterize the influence of different attenuation correction methods on PET images, the mean relative PET error across all the data sets for the whole brain and different regions were calculated and presented in Fig. 6. Clearly in all regions the Dixon-Unet method is the best among all Dixon methods. The Dixon-Seg method has the largest error due to the missing of bone signals. Comparing the standard deviations, Dixon-Unet method has the smallest standard deviation in all regions, meaning it is robust across different populations by using the information from other patient data sets. For all regions, the error of the Dixon-Unet method is below 3%. Fig. 7 shows the histogram plot of the PET error images for the three methods. The plot indicates that the error image of the Dixon-Unet method has the smallest standard deviation and the histogram shape is more like a Gaussian distribution with zero mean. The histogram shapes of the Dixon-Seg and Dixon-Atlas methods are more screwed. Specially the Dixon-Seg method is negatively biased due to missing bone.
Figure 6.

The bar plot of the mean relative PET error for all the patient data sets. Standard deviation of the relative PET error for all the patients are plotted as the error bar.
Figure 7.

The histogram of PET error images for the three Dixon methods.
4.2. Using both Dixon and ZTE MR images as input
In the following analysis, results using Dixon-Atlas, ZTE-Seg, DixonZTE-Unet and DixonZTE-GroupUnet methods were presented and compared using the twelve patient data sets with FDG scans. Fig. 8 shows three orthogonal views of the ground truth CT images as well as the generated pseudo CT images using different methods for one patient. Compared to the Dixon-Atlas method, the ZTE-Seg method can recover most of the bone regions as the contrast between the bone and neighboring pixels is good in the ZTE MR image. The images generated using the neural network methods are generally similar to the images generated using the ZTE-Seg method, but with more details revealed and closer to the CT ground truth. To compare the pseudo-CT qualities for each data set, Fig. 9 shows the CT validation loss using the U-net and GroupU-net structures. The proposed GroupU-net method has lower validation loss in 13 out of 14 data sets. Table. 2 presents the quantitative comparison of the predicted CT images. The proposed GroupU-net method has the smallest validation loss and the highest Dice index in the bone region. Fig. 10 gives three views of the PET reconstruction error images based on the corresponding pseudo CT images presented in Fig. 8. The Dixon-Atlas method has the largest error and the DixonZTE-GroupUnet method has the smallest error. Fig. 11 shows the plot of the mean relative PET error for all twelve patients across different regions. Clearly the Dixon-Atlas method has the largest mean error in all regions and the ZTE-Seg method generates smaller errors as compared with the Dixon-Atlas method. The proposed neural network methods can be better than both Dixon-Atlas and ZTE-Seg methods. Specially, the DixonZTE-GroupUnet can produce the smallest errors in all regions. This trend can also be observed in the histogram plot of the PET error images shown in Fig.12. The DixonZTE-GroupUnet method has both the smallest standard deviation and the smallest systematic bias.
Figure 8.

Comparison of the true CT image (first column) with generated pseudo CT images using the Dixon-Atlas method (second column), the ZTE-Seg method (third column), the DixonZTE-Unet method (fourth column) and the DixonZTE-GroupUnet method (last column).
Figure 9.

Comparison of the validation loss regarding the predicted CT images using U-net and the proposed GroupU-net when both Dixon and ZTE MR images are used as network input.
Table 2.
The comparison of the generated pseudo-CT when both Dixon and ZTE images are available (based on 14 patient data sets). The Dice index of bone was computed for the whole brain, regions above and below the eyes.
| Methods | Relative validation loss (%) | Dice of bone whole | Dice of bone above eye | Dice of bone below eye |
|---|---|---|---|---|
| Dixon-Atlas | 23.33 ± 3.23 | 0.52 ± 0.05 | 0.61 ± 0.06 | 0.29 ± 0.05 |
| ZTE-Seg | 16.20 ± 2.28 | 0.69 ± 0.05 | 0.75 ± 0.05 | 0.56 ± 0.07 |
| DixonZTE-Unet | 13.58 ± 1.53 | 0.77 ± 0.04 | 0.83 ± 0.04 | 0.66 ± 0.07 |
| DixonZTE-GroupUnet | 12.62 ± 1.46 | 0.80 ± 0.04 | 0.86 ± 0.03 | 0.69 ± 0.06 |
Figure 10.

PET reconstruction error images (PETpseudoCT – PETCT, unit: SUV) using the Dixon-Atlas method (first column), the ZTE-Seg method (second column), the DixonZTE-Unet method (third column) and the DixonZTE-GroupUnet method (last column).
Figure 11.

The bar plot of the mean relative PET error for the patient data sets with both Dixon and ZTE images. Standard deviations of the absolute error for all the patients are plotted as the error bar.
Figure 12.

The histogram plot of the PET SUV difference inside the whole brain for Dixon-Atlas, ZTE-Seg, DixonZTE-Unet and DixonZTE-GroupUnet methods.
5. Discussion
Dixon MR acquisition is simple and fast. It is widely deployed in current PET/MR systems as an option for further attenuation map derivation. As the signal intensity is low in the bone region, it is hard to segment the bone out. In this work, we employed the deep neural network method to predict pseudo CT images from Dixon images. From the CT images presented in Fig. 4, we can notice that the shape of the bone region predicted by the neural network method is much better than the atlas method. This indicates the neural network can recognize the bone region from the Dixon image input. Further quantitative analysis based on 40 patient data sets reveals that the mean relative PET error of the whole brain using the neural network method is within 3%, which demonstrates the reproducibility of the proposed method.
With the developments of new MR sequences, multiple MR images are available during the same scan. It is thus crucial to find an optimum way integrating the information from multiple MR images while not increasing the network complexity much, especially when the training data sets are not large enough. In this work we have proposed a modified U-net structure, named GroupU-net, to digest both Dixon and ZTE information through group convolution modules when the network goes deeper. The group convolution module first considers the cross-channel correlation through 1×1 convolution, and then handles the spatial correlation in smaller groups. Quantification analysis shows that the GroupU-net structure has better performance than the U-net structure when the network complexity is the same. This demonstrates that model parameters can be used more efficiently by making the network wider when the network goes deeper. It also shows that improving the network structure can generate better attenuation maps. Designing and testing different network structures will be one of our future work.
For the case of using both Dixon and ZTE images as network input, there are 12 patient data sets in each training group. Quantitative analysis demonstrates that 12 patient data sets can be used to train a network which provides higher CT prediction accuracy than the state-of-art methods. One limitation of this work is that no brain pathology was reported for the brain data sets employed in this study. We are unsure about the prediction accuracy for MR images with abnormal regions. If the test data do not lie in the training space due to population difference, the trained network may not accurately recover unseen structures. The robustness of the trained network to diseased data sets deserves further evaluations.
As for the objective function employed in the network training, L1 norm was found to be better than L2 norm. L2 norm results in blurrier images. We also tried another objective function by including additional L1 difference between the gradient images of the ground-truth CT and pseudo CT in both the horizontal and vertical directions. Though the generated CT image had a sharper bone, quantification for the inner regions, such as the putamen and caudate, showed worse results. The sizes of the air-cavity regions in the ZTE and Dixon MR images are different. As different methods extract information from Dixon only, ZTE only, or Dixon and ZTE combined, there will be difference about the delineations of air-cavity as shown in Fig. 4 and Fig. 8. Additionally, we noticed that for the MR and CT images acquired in two different scanners, the jaw and head-neck regions could not be registered well in some cases due to position difference. This can produce errors as the training presumes that the CT and MR images match perfectly. Generalized adversarial networks which do not depend on the paired MR and CT images might help solve this problem.
6. Conclusion
We have proposed a neural network method to generate the continuous attenuation map for brain PET imaging based on the Dixon MR images only, and based on Dixon and ZTE images combined. Analysis using real data sets shows that the neural network method can produce smaller PET quantification errors as compared to other standard methods. When both Dixon and ZTE images are available, the proposed GroupU-net structure, which extracts features from Dixon and ZTE images through group convolution modules when the network goes deeper, can have better performance than the U-net structure. Future work will focus on designing and testing different network structures to better improve the results as well as testing the robustness of the trained network to diseased data sets.
References
- Alessio AM, Stearns CW, Tong S, Ross SG, Kohlmyer S, Ganin A, Kinahan PE. Application and evaluation of a measured spatially variant system model for PET image reconstruction. IEEE Transactions on Medical Imaging. 2010;29(3):938–949. doi: 10.1109/TMI.2010.2040188. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Avants BB, Tustison N, Song G. Advanced normalization tools (ANTS) Insight j. 2009;2:1–35. [Google Scholar]
- Berker Y, Franke J, Salomon A, Palmowski M, Donker HC, Temur Y, Mottaghy FM, Kuhl C, Izquierdo-Garcia D, Fayad ZA, et al. MRI-based attenuation correction for hybrid PET/MRI systems: a 4-class tissue segmentation technique using a combined ultrashort-echo-time/Dixon MRI sequence. Journal of Nuclear Medicine. 2012;53(5):796–804. doi: 10.2967/jnumed.111.092577. [DOI] [PubMed] [Google Scholar]
- Burgos N, Cardoso MJ, Thielemans K, Modat M, Pedemonte S, Dickson J, Barnes A, Ahmed R, Mahoney CJ, Schott JM, et al. Attenuation correction synthesis for hybrid PET-MR scanners: application to brain studies. IEEE Transactions on Medical Imaging. 2014;33(12):2332–2341. doi: 10.1109/TMI.2014.2340135. [DOI] [PubMed] [Google Scholar]
- Carney JP, Townsend DW, Rappoport V, Bendriem B. Method for transforming CT images for attenuation correction in PET/CT imaging. Medical Physics. 2006;33(4):976–983. doi: 10.1118/1.2174132. [DOI] [PubMed] [Google Scholar]
- Catana C, Benner T, van der Kouwe A, Byars L, Hamm M, Chonde DB, Michel CJ, El Fakhri G, Schmand M, Sorensen AG. MRI-assisted PET motion correction for neurologic studies in an integrated MR-PET scanner. Journal of Nuclear Medicine. 2011;52(1):154–161. doi: 10.2967/jnumed.110.079343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen H, Zhang Y, Zhang W, Liao P, Li K, Zhou J, Wang G. Low-dose ct via convolutional neural network. Biomedical Optics Express. 2017;8(2):679–694. doi: 10.1364/BOE.8.000679. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chollet F. Xception: Deep learning with depthwise separable convolutions. arXiv preprint 2016 [Google Scholar]
- Defrise M, Rezaei A, Nuyts J. Time-of-flight PET data determine the attenuation sinogram up to a constant. Physics in Medicine & Biology. 2012;57(4):885. doi: 10.1088/0031-9155/57/4/885. [DOI] [PubMed] [Google Scholar]
- Fonov VS, Evans AC, McKinstry RC, Almli C, Collins D. Unbiased nonlinear average age-appropriate brain templates from birth to adulthood. NeuroImage. 2009;47:S102. [Google Scholar]
- Gong K, Cheng-Liao J, Wang G, Chen KT, Catana C, Qi J. Direct patlak reconstruction from dynamic PET data using the kernel method with MRI information based on structural similarity. IEEE Transactions on Medical Imaging. 2017c doi: 10.1109/TMI.2017.2776324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gong K, Guan J, Kim K, Zhang X, Fakhri GE, Qi J, Li Q. Iterative PET image reconstruction using convolutional neural network representation. arXiv preprint arXiv:1710.03344. 2017b doi: 10.1109/TMI.2018.2869871. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gong K, Zhou J, Tohme M, Judenhofer M, Yang Y, Qi J. Sinogram blurring matrix estimation from point sources measurements with rank-one approximation for fully 3D PET. IEEE Transactions on Medical Imaging. 2017a doi: 10.1109/TMI.2017.2711479. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grant AM, Deller TW, Khalighi MM, Maramraju SH, Delso G, Levin CS. NEMA NU 2-2012 performance studies for the SiPM-based ToF-PET component of the GE SIGNA PET/MR system. Medical Physics. 2016;43(5):2334–2343. doi: 10.1118/1.4945416. [DOI] [PubMed] [Google Scholar]
- Han X. MR-based synthetic CT generation using a deep convolutional neural network method. Medical Physics. 2017;44(4):1408–1419. doi: 10.1002/mp.12155. [DOI] [PubMed] [Google Scholar]
- Holmes CJ, Hoge R, Collins L, Woods R, Toga AW, Evans AC. Enhancement of MR images using registration for signal averaging. Journal of Computer Assisted Tomography. 1998;22(2):324–333. doi: 10.1097/00004728-199803000-00032. [DOI] [PubMed] [Google Scholar]
- Huynh T, Gao Y, Kang J, Wang L, Zhang P, Lian J, Shen D. Estimating CT image from MRI data using structured random forest and auto-context model. IEEE Transactions on Medical Imaging. 2016;35(1):174–183. doi: 10.1109/TMI.2015.2461533. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ioffe S, Szegedy C. International Conference on Machine Learning. 2015:448–456. [Google Scholar]
- Izquierdo-Garcia D, Hansen AE, Förster S, Benoit D, Schachoff S, Fürst S, Chen KT, Chonde DB, Catana C. An SPM8-based approach for attenuation correction combining segmentation and nonrigid template formation: application to simultaneous PET/MR brain imaging. Journal of Nuclear Medicine. 2014;55(11):1825–1830. doi: 10.2967/jnumed.113.136341. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kang E, Min J, Ye JC. A deep convolutional neural network using directional wavelets for low-dose x-ray ct reconstruction. arXiv preprint arXiv:1610.09736. 2016 doi: 10.1002/mp.12344. [DOI] [PubMed] [Google Scholar]
- Keereman V, Fierens Y, Broux T, De Deene Y, Lonneux M, Vandenberghe S. MRI-based attenuation correction for PET/MRI using ultrashort echo time sequences. Journal of Nuclear Medicine. 2010;51(5):812–818. doi: 10.2967/jnumed.109.065425. [DOI] [PubMed] [Google Scholar]
- Khalifé M, Fernandez B, Jaubert O, Soussan M, Brulon V, Buvat I, Comtat C. Subject-specific bone attenuation correction for brain PET/MR: can ZTE-MRI substitute CT scan accurately? Physics in Medicine & Biology. 2017;62(19):7814. doi: 10.1088/1361-6560/aa8851. [DOI] [PubMed] [Google Scholar]
- Kim K, Yang J, El Fakhri G, Seo Y, Li Q. Penalized MLAA with spatially-encoded anatomic prior in TOF PET/MR. Nuclear Science Symposium and Medical Imaging Conference Record. 2016:1–4. [Google Scholar]
- Kinahan PE, Hasegawa BH, Beyer T. X-ray-based attenuation correction for positron emission tomography/computed tomography scanners. Seminars in Nuclear Medicine. 2003;33(3):166–179. doi: 10.1053/snuc.2003.127307. [DOI] [PubMed] [Google Scholar]
- Kingma D, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 2014 [Google Scholar]
- Ladefoged CN, Benoit D, Law I, Holm S, Kjær A, Højgaard L, Hansen AE, Andersen FL. Region specific optimization of continuous linear attenuation coefficients based on UTE (RESOLUTE): application to PET/MR brain imaging. Physics in Medicine & Biology. 2015;60(20):8047. doi: 10.1088/0031-9155/60/20/8047. [DOI] [PubMed] [Google Scholar]
- Ladefoged CN, Law I, Anazodo U, Lawrence KS, Izquierdo-Garcia D, Catana C, Burgos N, Cardoso MJ, Ourselin S, Hutton B, et al. A multi-centre evaluation of eleven clinically feasible brain PET/MRI attenuation correction techniques using a large cohort of patients. NeuroImage. 2017;147:346–359. doi: 10.1016/j.neuroimage.2016.12.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leynes AP, Yang J, Shanbhag DD, Kaushik SS, Seo Y, Hope TA, Wiesinger F, Larson PE. Hybrid ZTE/Dixon MR-based attenuation correction for quantitative uptake estimation of pelvic lesions in PET/MRI. Medical Physics. 2017;44(3):902–913. doi: 10.1002/mp.12122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leynes AP, Yang J, Wiesinger F, Kaushik SS, Shanbhag DD, Seo Y, Hope TA, Larson PE. Direct pseudoct generation for pelvis PET/MRI attenuation correction using deep convolutional neural networks with multi-parametric MRI: Zero echo-time and dixon deep pseudoCT (ZeDD-CT) Journal of Nuclear Medicine. 2017b doi: 10.2967/jnumed.117.198051. pp. jnumed–117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li Q, Li H, Kim K, El Fakhri G. Joint estimation of activity image and attenuation sinogram using time-of-flight positron emission tomography data consistency condition filtering. Journal of Medical Imaging. 2017;4(2):023502–023502. doi: 10.1117/1.JMI.4.2.023502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu F, Jang H, Kijowski R, Bradshaw T, McMillan AB. Deep learning MR imaging–based attenuation correction for PET/MR imaging. Radiology. 2017:170700. doi: 10.1148/radiol.2017170700. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martinez-Möller A, Souvatzoglou M, Delso G, Bundschuh RA, Chefd’hotel C, Ziegler SI, Navab N, Schwaiger M, Nekolla SG. Tissue classification as a potential approach for attenuation correction in whole-body PET/MRI: evaluation with PET/CT data. Journal of Nuclear Medicine. 2009;50(4):520–526. doi: 10.2967/jnumed.108.054726. [DOI] [PubMed] [Google Scholar]
- Mehranian A, Zaidi H. Joint estimation of activity and attenuation in whole-body TOF PET/MRI using constrained Gaussian mixture models. IEEE Transactions on Medical Imaging. 2015;34(9):1808–1821. doi: 10.1109/TMI.2015.2409157. [DOI] [PubMed] [Google Scholar]
- Mehranian A, Zaidi H, Reader AJ. MR-guided joint reconstruction of activity and attenuation in brain PET-MR. NeuroImage. 2017;162:276–288. doi: 10.1016/j.neuroimage.2017.09.006. [DOI] [PubMed] [Google Scholar]
- Nie D, Trullo R, Lian J, Petitjean C, Ruan S, Wang Q, Shen D. Medical image synthesis with context-aware generative adversarial networks. Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention. 2017:417–425. doi: 10.1007/978-3-319-66179-7_48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qi J, Leahy RM, Cherry SR, Chatziioannou A, Farquhar TH. High-resolution 3D Bayesian image reconstruction using the microPET small-animal scanner. Physics in Medicine & Biology. 1998;43(4):1001. doi: 10.1088/0031-9155/43/4/027. [DOI] [PubMed] [Google Scholar]
- Rezaei A, Defrise M, Bal G, Michel C, Conti M, Watson C, Nuyts J. Simultaneous reconstruction of activity and attenuation in time-of-flight PET. IEEE Transactions on Medical Imaging. 2012;31(12):2224–2233. doi: 10.1109/TMI.2012.2212719. [DOI] [PubMed] [Google Scholar]
- Ronneberger O, Fischer P, Brox T. International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2015. pp. 234–241. [Google Scholar]
- Sekine T, ter Voert EE, Warnock G, Buck A, Huellner M, Veit-Haibach P, Delso G. Clinical evaluation of zero-echo-time attenuation correction for brain 18F-FDG PET/MRI: comparison with atlas attenuation correction. Journal of Nuclear Medicine. 2016;57(12):1927–1932. doi: 10.2967/jnumed.116.175398. [DOI] [PubMed] [Google Scholar]
- Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:2818–2826. [Google Scholar]
- Tustison NJ, Avants BB, Cook PA, Zheng Y, Egan A, Yushkevich PA, Gee JC. N4ITK: improved N3 bias correction. IEEE Transactions on Medical Imaging. 2010;29(6):1310–1320. doi: 10.1109/TMI.2010.2046908. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang S, Su Z, Ying L, Peng X, Zhu S, Liang F, Feng D, Liang D. Accelerating magnetic resonance imaging via deep learning. Proceedings of International Symposium on Biomedical Imaging. 2016:514–517. doi: 10.1109/ISBI.2016.7493320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wollenweber S, Ambwani S, Delso G, Lonn A, Mullick R, Wiesinger F, Piti Z, Tari A, Novak G, Fidrich M. Evaluation of an atlas-based PET head attenuation correction using PET/CT & MR patient data. IEEE Transactions on Nuclear Science. 2013;60(5):3383–3390. [Google Scholar]
- Wu D, Kim K, Dong B, Li Q. End-to-end abnormality detection in medical imaging. arXiv preprint arXiv:1711.02074 2017b [Google Scholar]
- Wu D, Kim K, El Fakhri G, Li Q. Iterative low-dose CT reconstruction with priors trained by artificial neural network. IEEE Transactions on Medical Imaging. 2017a doi: 10.1109/TMI.2017.2753138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xie S, Girshick R, Dollar P, Tu Z, He K. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017 [Google Scholar]
- Yang J, Jian Y, Jenkins N, Behr SC, Hope TA, Larson PE, Vigneron D, Seo Y. Quantitative evaluation of atlas-based attenuation correction for brain PET in an integrated time-of-flight PET/MR imaging system. Radiology. 2017a:161603. doi: 10.1148/radiol.2017161603. [DOI] [PubMed] [Google Scholar]
- Yang J, Wiesinger F, Kaushik S, Shanbhag D, Hope TA, Larson PE, Seo Y. Evaluation of sinus/edge corrected ZTE-based attenuation correction in brain PET/MRI. Journal of Nuclear Medicine. 2017 doi: 10.2967/jnumed.116.188268. pp. jnumed–116. [DOI] [PMC free article] [PubMed] [Google Scholar]
