On the effect of training database size for MR-based synthetic CT generation in the head

Seyed Iman Zare Estakhraji; Ali Pirasteh; Tyler Bradshaw; Alan McMillan

doi:10.1016/j.compmedimag.2023.102227

. Author manuscript; available in PMC: 2024 Jul 1.

Published in final edited form as: Comput Med Imaging Graph. 2023 Apr 26;107:102227. doi: 10.1016/j.compmedimag.2023.102227

On the effect of training database size for MR-based synthetic CT generation in the head

Seyed Iman Zare Estakhraji ^a,^*, Ali Pirasteh ^a,^b, Tyler Bradshaw ^a, Alan McMillan ^a,^b,^c,^d

PMCID: PMC10483321 NIHMSID: NIHMS1895731 PMID: 37167815

Abstract

Generation of computed tomography (CT) images from magnetic resonance (MR) images using deep learning methods has recently demonstrated promise in improving MR-guided radiotherapy and PET/MR imaging.

Purpose:

To investigate the performance of unsupervised training using a large number of unpaired data sets as well as the potential gain in performance after fine-tuning with supervised training using spatially registered data sets in generation of synthetic computed tomography (sCT) from magnetic resonance (MR) images.

Materials and methods:

A cycleGAN method consisting of two generators (residual U-Net) and two discriminators (patchGAN) was used for unsupervised training. Unsupervised training utilized unpaired T1-weighted MR and CT images (2061 sets for each modality). Five supervised models were then fine-tuned starting with the generator of the unsupervised model for 1, 10, 25, 50, and 100 pairs of spatially registered MR and CT images. Four supervised training models were also trained from scratch for 10, 25, 50, and 100 pairs of spatially registered MR and CT images using only the residual U-Net generator. All models were evaluated on a holdout test set of spatially registered images from 253 patients, including 30 with significant pathology. sCT images were compared against the acquired CT images using mean absolute error (MAE), Dice coefficient, and structural similarity index (SSIM). sCT images from 60 test subjects generated by the unsupervised, and most accurate of the fine-tuned and supervised models were qualitatively evaluated by a radiologist.

Results:

While unsupervised training produced realistic-appearing sCT images, addition of even one set of registered images improved quantitative metrics. Addition of more paired data sets to the training further improved image quality, with the best results obtained using the highest number of paired data sets $(n = 100)$ . Supervised training was found to be superior to unsupervised training, while fine-tuned training showed no clear benefit over supervised learning, regardless of the training sample size.

Conclusion:

Supervised learning (using either fine tuning or full supervision) leads to significantly higher quantitative accuracy in the generation of sCT from MR images. However, fine-tuned training using both a large number of unpaired image sets was generally no better than supervised learning using registered image sets alone, suggesting the importance of well registered paired data set for training compared to a large set of unpaired data.

Keywords: Generative adversarial networks (GAN), Synthetic CT generation, Fine-Tuning, MR-guided radiotherapy

1. Introduction

Magnetic resonance imaging (MRI) is useful in radiotherapy planning for improved organ-at-risk segmentation as well as differentiation between tumor and normal tissue, especially in the brain, pelvis, and liver, due to its superior soft tissue contrast over computed tomography (CT). However, CT enables a more straightforward way to approximate a tissue electron density map needed for radiotherapy treatment planning (Khoo and Joon, 2006; Schmidt and Payne, 2015; Jonsson et al., 2019; Kerkmeijer et al., 2018). Hence, both MRI and CT are often used together for treatment planning, requiring MRI images to be spatially registered to CT images, which is computationally expensive and can be error-prone, introducing systematic uncertainties throughout the course of treatment (Edmund and Nyholm, 2017). Furthermore, additional CT acquisitions may be unfavorable due to increased ionizing radiation exposure, workflow challenges, and increased cost (Karlsson et al., 2009). Hence, there is strong motivation to perform treatment planning using MRI as the sole imaging modality (Devic, 2012) by generating synthetic CT images (sCT or pseudo-CT) from MRI images (Edmund and Nyholm, 2017; Price et al., 2016; Johnstone et al., 2018). Likewise, sCT generation can be used to provide attenuation correction for positron emission tomography/MRI (PET/MRI) imaging (McMillan and Bradshaw, 2021; Spadea et al., 2021).

Recently, artificial intelligence (AI)-based methods have shown promise at generating sCT images (Boulanger et al., 2021). In AI-based methods, a deep learning model can be trained to map the pixel values of MR images into a CT-like image with values in Hounsfield units (HU). For example, a deep convolutional neural network (CNN) can be used for MR-to-CT mapping by training with spatially registered data sets (Liu et al., 2018; Jang et al., 2018; Liu et al., 2019; Chen et al., 2018). These approaches typically utilize mean squared error or mean absolute error loss terms which can lead to blurriness in the resulting sCT images. This limitation can be addressed by adding an adversarial loss term using generalized adversarial networks (GANs) (Goodfellow et al., 2014). In GANs, a discriminator is introduced to distinguish the generated sCT from the real CT (Isola et al., 2017). The structure of a GAN enables a potential improvement in the perceived quality of the generated sCT (Nie et al., 2018; Maspero et al., 2018; Emami et al., 2018; Kazemifar et al., 2019).

Conventionally, CNNs, including GAN-based methods, require MR and CT images to be perfectly spatially registered. Cross-modality registration for large datasets can take considerable effort and can introduce errors. To overcome this limitation, several GANs with dual learning using cyclic-loss approach are proposed (He et al., 2016; Yi et al., 2017; Kim et al., 2017; Zhu et al., 2017). These circularity-based approaches lack pixel-level supervision between the transferred images and target images while they maintain image-level adversarial supervision in the target domain. For example, CycleGAN uses a cycled regression in the source domain in pixel-level, (source → target → source) in addition to an adversarial difference between the target and synthetic images. Several studies (Wolterink et al., 2017; Xiang et al., 2018; Yang et al., 2018; Lei et al., 2019; Klages et al., 2020; Jabbarpour et al., 2022) implemented a CycleGAN (Zhu et al., 2017) for generating synthetic CT from MRI images without requiring spatially-registered MR and CT images. When a CycleGAN is implemented for the generation of synthetic CTs, a set of networks are simultaneously trained to map from MR to CT and cyclically, from CT to MR without requiring carefully registered input images. Although unsupervised training methods such as CycleGAN can be implemented when unpaired data is available, performance can be limited due to the absence of the supervised loss term when paired data set is also available. Moreover, circularity-based approaches such as CycleGAN have their own limitation such as one-to-one mapping (Almahairi et al., 2018) and lack of enforcing regularity mechanism (Zhang et al., 2019). Furthermore, due to the lack of pixel-wise supervision in the target domain, it is not guaranteed that the target images have the desired quality even if the cyclic consistency is stratified during the training. Therefore, when both unpaired and paired data sets are available, fine-tuning (semi-supervised training) using a supervised loss term can be utilized to improve the performance.

Given the known “data-hungry” nature of AI-based techniques (Thrall et al., 2018), the impact of the size of training set on performance of the AI-based technique in generation of sCT from MRI remains unknown, in large part due to the challenge of obtaining large data sets of MRI and CT images from the same subjects. The impact of the nature of the training set is another unknown. That is, whether accurate sCT image generation can only be achieved by using training sets consisting of spatially registered (paired) MRI-CT sets versus by using a larger number of MRI-CT sets that are not spatially registered (unpaired) has not been investigated. This paper aims to study MR-based synthetic CT generation in the head using a large number of unpaired datasets as well as a small number of paired datasets. Hence, we evaluated the impact of training set size utilizing a cohort of more than 2000 MR and CT data sets, consist of both paired and unpaired sets. To evaluate the performance of fine-tuned training, the trained model with unpaired data set were fine-tuned using the same network generator architecture of CycleGAN using spatially registered data sets in this paper. Additionally, we compared sCT models trained using the spatially unregistered data set with sCT models fine-tuned with an increasing number of spatially registered data sets as well as models trained only with increasing numbers of spatially registered data sets (supervised training with the same network generator architecture of CycleGAN). Models were evaluated using a set of more than 200 subjects including both normal and abnormal brains. To the best knowledge of the authors, this is the largest test set that has been used to evaluate a trained model for the generation of sCT images.

2. Materials and methods

2.1. Data acquisition

Anonymized clinical imaging data was obtained using an IRB-approved waiver of consent from the PACS system at our institution. Selection criteria included patients who had undergone both CT and MR imaging of the head within 48 h, with MR imaging performed at 1.5T (MR450w or HDxt, GE Healthcare) and including a specific pulse sequence: 3D T1-weighted inversion prepared gradient-recalled echo, echo time = 3.5 ms, inversion time = 450 ms, repetition time = 9.5 ms, in-plane resolution: 0.98 × 0.98 mm, slice thickness = 2.4–3.0 mm. CT exams were performed on one of three scanners (Optima CT 660, Discovery CT750HD, or Revolution GSI, GE Healthcare) with the following acquisition and reconstruction parameters: 0.45 × 0.45 mm transaxial resolution, 1.25 or 2.5 mm slice thickness, 120 kVp, automatic exposure control with noise index of 2.8–12.4, and helical pitch of 0.53. A small subset of this data set (92 patients) was included in a previous study utilizing a different approach [reference blinded for review]. MR and CT exams were obtained for 2420 patients. 2061 subjects that were minimally pre-processed were used for the spatially unregistered (unpaired) training. 359 images were spatially registered using the registration function from the ANTsPy package (Tustison et al., 2021) to register the CT images to their corresponding MRI images using “Symmetric normalization”. Moreover, all images were resampled into a 256 × 256 matrix with a voxel size of 1.0 × 1.0 × 1.0 mm³. For model training, MR images were intensity normalized by dividing them by a constant value of 3500 (which we found to be comparable to zero-mean unit-variance normalization), and all the CT intensity values were first clipped to [−1000, 2500] HU and then were normalized to map to [−1, 1]. To minimize the impact that image features occurring outside of the head had on network performance, and for each spatially registered MR and CT subjects used for training of the fined-tuned or supervised models as well as the testing purposes, a mask was extracted from the CT using “masking” function from the Nilearn package (Abraham et al., 2014), and was applied to its corresponding registered MRI. Up to 100 of these cases were used for fine-tuning a pre-trained model or for supervised training. Six subjects were used to determine early stopping and the remaining 253 subjects were used for independent testing.

2.2. Network architectures

CycleGAN is a generative adversarial network (GAN) using two generators: a MRI-to-CT generator $(G_{s C T} (M R_{i n}))$ and a CT-to-MRI generator $(G_{s M R} (C T_{i n}))$ , Fig. 1. In our implementation, each generator consists of 9 residual blocks with max-pooling/unpooling layers, transformer and decoder depicted in Fig. 2. Each layer consists of an instance normalization and a ReLU layer which is not shown in Fig. 2. The encoder reduces the representation size while increases the number of channels, and the resulting output from the encoder is sent into a series of 9 residual blocks. The final image is then produced by the decoder. Furthermore, each generator has a corresponding discriminator that differentiates synthesized images from real ones: a real CT/synthetic-CT discriminator $(D_{s C T})$ corresponded to $(G_{s C T})$ and a real MRI/synthetic-MRI discriminator $(D_{s M R})$ for $(G_{s M R})$ . Each discriminator consists of a convolutional neural network shown in Fig. 3 called “PatchGANs” (Isola et al., 2017). When PatchGAN is fed with an input, it slides a window over the input and gives the probabilities of the corresponding image patch to be “real” or “fake” in a tensor shape (for example, 1 × 30 × 30 for the CycleGAN used here).

Fig. 2. — Architecture for the CycleGAN generator: The generator takes inputs of (3 × 256 × 256) and applies the encoder layer for down-sampling. Nine blocks of residual networks are applied, and finally the inputs are upsampled to (3 × 256 × 256). Each residual block includes two normalization layers and an activation layer.

Fig. 3. — Architecture for the CycleGAN discriminator: It takes an image as input, and returns a tensor of 1 × 30 × 30. Each value of the output stands for the classification results of a patch with a specific size meaning that the discriminator checks the patches of the input (they can have overlap) for being real or fake.

$(G_{s C T})$ is given three adjacent MR slices as input and generates three sCT images with the same size of input images. $(D_{s C T})$ predicts whether the given CT images is real or synthetic resulting in a loss function as $(l_{s C T})$ . Simultaneously, $(G_{s M R})$ takes CT slices as input and generates sMRI slices and $(D_{s M R})$ predicts whether an MRI image is real or synthetic resulting in $(l_{s M R})$ . The adversarial loss terms (Zhu et al., 2017), $(l_{s C T})$ and $(l_{s M R})$ , are defined as:

l_{s C T} (G_{s C T}, M R_{i n}, D_{s C T}) = \frac{1}{m} \sum_{i = 1}^{m} {(1 - D_{s C T} (G_{s C T} (M R_{i n}^{i})))}^{2}

(1a)

l_{s M R} (G_{s M R}, C T_{i n}, D_{s M R}) = \frac{1}{m} \sum_{i = 1}^{m} {(1 - D_{s M R} (G_{s M R} (C T_{i n}^{i})))}^{2}

(1b)

The adversarial loss defined in Eq. (1) alone are not adequate to guarantee good results for spatially unregistered data, therefore, cycle consistency loss is also defined:

l_{c} (G_{s M R}, C T_{i n}, G_{s C T}, M R_{i n}) = \frac{1}{m} \sum_{i = 1}^{m} [[G_{s C T} (G_{s M R} (C T_{i n}^{i})) - C T_{i n}^{i}] + [G_{s M R} (G_{s C T} (M R_{i n}^{i})) - M R_{i n}^{i}]]

(2)

The cycle loss attempts to make it so that the image output of the first generator can be fed into the second generator, and the output of the second generator will closely match the original image. Therefore, in every cycle, two GANs are trained and an MRI slice fed into $(G_{s C T})$ should be the identical to $(M R_{c y c l e} = G_{s M R} (G_{s C T} (M R_{i n})))$ , and a CT slice fed into $(G_{s M R})$ should be identical to $(C T_{c y c l e} = G_{s C T} (G_{_{s M R}} (C T_{i n})))$ . Eventually, defining a hyperparameter $λ, (λ = 10$ in this paper), the full objective function is defined as:

l_{f u l l} = I_{a d v} + λ l_{c}

(3)

where $l_{a d v}$ can be obtained from Eq. (1).

For unsupervised (i.e. spatially unregistered) training, the loss term in Eq. (3) is used. For fine-tuning the unsupervised model with registered data and for supervised training, the following loss term representing the error of sCT and real registered CT was used to enforce pixel-wise supervision between the target and translated images (see Fig. 1):

l_{s u p e r v i s e d} (G_{s C T}, M R_{i n}) = \frac{1}{m} \sum_{i = 1}^{m} [G_{s C T} (M R_{i n}^{i}) - C T_{i n}^{i}]

(4)

2.3. Network training

The cycleGAN model was trained with 1 batch using an NVIDIA DGX A100 system with eight A100 GPUs, 256 CPU cores, and 1 TB RAM. One GPU at a time was used for training of each model. The training started with the learning rate of 0.0002 for the generators and 0.0001 for the discriminators, respectively. We used Adam optimization for training with hyper-parameters $β_{1}$ and $β_{2}$ set at 0.5 and 0.99, respectively. Three main training approaches were employed: (1) Unsupervised Training: non-registered axial MR and CT images of 2061 subjects were included, the model was trained using 21 epochs and the validation set was used for the early stopping. Moreover, to study the effect of spatial registration on the unsupervised training, smaller data sets (10 and 25 spatially-registered MR and CT data sets) were used and the model was trained for 600 epochs to investigate effect of registration on the results of unsupervised training while the same evaluation set was used for the early stopping. (2) Fine-Tuned Training: the MRI-to-CT generator $(G_{s C T} (M R_{i n}))$ of the Unsupervised Training model was further trained using up to 100 spatially registered data set(s) of MR and CT images for fine-tuning purposes. This Fine-Tuning approach was trained five times each for 150 epochs with different numbers of spatially registered data sets (1, 10, 25, 50, 100) to investigate the effect of an increasing number of spatially registered data sets on Fine-Tuning.

(3) Supervised Training: only the spatially-registered data sets were employed for training purposes where a MRI-to-CT generator, $G_{s C T} (M R_{i n})$ was trained from scratch using the loss term from Eq. (4). Models were trained using different numbers of paired data sets (10, 25, 50, 100) for 150 epochs. Overall, 12 MRI-to-CT models were trained between the unsupervised training $(N = 3)$ , fine-tuned Training $(N = 5)$ , and Supervised Training $(N = 4)$ strategies.

Data from 253 spatially registered subjects were used for testing. This included a set of 223 subjects with normal appearing scans and a set of 30 subjects with anatomical abnormalities; none of these subjects were included in the training or validation sets. Each test subject was evaluated for all 12 trained models and the resulting sCT images were compared to the real CT images. For every sCT image, the bone and also soft-tissue regions were extracted from the images using Hounsfield Unit ranges of (HU ≥ 250), (−50 ≤ HU ≤ 50), respectively, and were compared with the bone and soft-tissue regions obtained with the same criteria from the corresponding real CT image using mean absolute error (MAE):

M A E = \frac{1}{N} \sum_{j = 1}^{N} [G_{s C T} (M R_{i n} (j)) - C T_{i n} (j)]

(5)

where $j$ represent a pixel in the real and synthesized CT images. In this paper, the MAE function of “metrics” class from the MONAI Project (Consortium, 2020) was used for the evaluation. Pixel-wise Dice coefficient (ranging from 0 to 1) that measures the overlap between two labels was also calculated for the bone and soft-tissue of the real CT and sCT images. The Dice coefficient was computed using the “distance.dice” function from the SciPy package (Virtanen et al., 2020), which computes the Dice dissimilarity between two boolean 1-D arrays. SSIM (structural similarity index measure) was also considered to evaluate the performance of different trained models described above using the “structural-similarity” function from “scikit-image” package (van der Walt et al., 2014). A pair-wise Wilcoxon signed-rank test was used to compare the aforementioned metrics between the unsupervised method and other 11 training methods. A $p$ -value of less than 0.0042 was considered statistically significant at the 0.05 level after correction for multiple comparisons using the False Discovery Rate (FDR).

2.4. Qualitative image evaluation

sCT image sets from 60 test subjects generated by 4 of the 12 models were qualitatively evaluated by a dual-board-certified radiologist/nuclear medicine physician with fellowship training in body MRI and nuclear medicine, who interprets brain PET/CT and PET/MRI as part of routine clinical practice. The 4 training models chosen were: Unsupervised Training, Fine-Tuned Training with 1 and 100 paired set(s), and Supervised Training with 100 paired sets. These training models were chosen with the goal of evaluating the impact of fine-tuning, the number of paired sets used for fine-tuning, and also providing supervision on sCT subjective image quality and anatomic/abnormality depiction accuracy. For each subject, the real CT and sCT series were randomly rearranged and relabeled, with the radiologist blinded to the real versus synthetic nature of the CT images as well as the type of training model utilized to generate the sCT series. The radiologist reviewed all image sets for each subject side-by-side and ranked the CT images from 1 to 5, where 1 was assigned to the series that the radiologists considered as the real CT and 2 through 5 to the ones identified as synthetic ones, in the order of overall most to least similar to the series the radiologist considered as the real CT. The radiologist was also asked to comment on whether the sCT series were considered as accurate-enough/adequate for diagnostic purposes if used alone to replace a CT in routine clinical practice.

3. Results

3.1. Quantitative evaluation

Table 1 summarizes the quantitative evaluation for all models for both the normal-appearing and abnormal testing sets. Compared to the Unsupervised Training model, all other models demonstrated statistically significant differences in the evaluation metrics. The violin plots of MAE, SSIM and Dice score of the different trained models are depicted in Figs. 5 and 6, respectively. The table indicates that Unsupervised Training with 2061 data sets yields the lowest Dice score and highest MAE among all the models. It can be seen that unsupervised training using a small set of spatially registered data (10 and 25) instead of a large data set of spatially unregistered subjects led to significant quantitative improvement compared to the unsupervised training with the large spatially unregistered data set. For example, using only 10 spatially registered datasets for unsupervised training, the MAE of the bone regions improved by 27% and 24% for the normal and abnormal cases relative to the unsupervised training with 2061 spatially unregistered data, respectively. Additional improvement was observed when 25 spatially registered data sets were used for the unsupervised training.

Table 1.

The evaluated metrics of all the trained models. The p values compare all the training models to the unsupervised method using a Wilcoxon signed-rank two-sided test. It can be seen that the Fined-Tuned and supervised models with 100 pairs showed the best performance for both the normal and abnormal cases.

		Normal					Abnormal
		Bone		Soft-tissue			Bone		Soft-Tissue
		MAE	Dice score	MAE	Dice	SSIM	MAE	Dice score	MAE	Dice score	SSIM

Unsupervised	Mean	78.47 ± 12.09	0.75 ± 0.03	64.25 ± 10.72	0.84 ± 0.02	0.81 ± 0.02	81.49 ± 15.44	0.74 ± 0.04	67.05 ± 11.57	0.83 ± 0.02	0.81 ± 0.03

Unsupervised (10 registered pairs)	Mean p value	57.19 ± 14.16 1.83E-36	0.83 ± 0.05 1.73E-06	51.94 ± 9.33 3.46E-36	0.87 ± 0.02 1.73E-06	0.84 ± 0.02 6.53E-36	60.1 ± 12.05 1.73E-06	0.82 ± 0.04 4.72E-36	55.59 ± 11.03 1.73E-06	0.86 ± 0.03 1.59E-35	0.83 ± 0.02 1.49E-05

Unsupervised (20 registered pairs)	Mean p value	54.41 ± 11.06 1.61E-37	0.84 ± 0.04 1.73E-06	48.42 ± 9.01 1.63E-37	0.88 ± 0.02 1.73E-06	0.85 ± 0.02 1.61E-37	57.84 ± 12.22 1.73E-06	0.83 ± 0.04 1.63E-37	52.41 ± 11.18 1.73E-06	0.87 ± 0.03 1.66E-37	0.85 ± 0.02 1.73E-06

Fine-Tuned (1 pair)	Mean p value	58.59 ± 17.25 1.54E-33	0.83 ± 0.06 1.92E-06	49.37 ± 10.38 1.32E-35	0.87 ± 0.03 1.73E-06	0.88 ± 0.02 5.33E-36	61.26 ± 14.87 1.73E-06	0.82 ± 0.04 1.75E-34	52.09 ± 9.9 1.73E-06	0.87 ± 0.02 1.61E-37	0.87 ± 0.02 1.73E-06

Fine-Tuned (10 pairs)	Mean p value	47.73 ± 16.02 2.40E-37	0.86 ± 0.06 1.73E-06	43.11 ± 9.61 2.12E-37	0.89 ± 0.02 3.18E-06	0.88 ± 0.02 1.98E-37	52.33 ± 13.59 1.92E-06	0.84 ± 0.04 2.18E-37	47.59 ± 11.11 2.60E-06	0.88 ± 0.03 1.61E-37	0.87 ± 0.02 1.73E-06

Supervised (10 pairs)	Mean p value	46.02 ± 14.59 1.68E-37	0.86 ± 0.05 1.73E-06	41.5 ± 9.37 2.21E-37	0.9 ± 0.02 1.92E-06	0.89 ± 0.02 1.63E-37	50.73 ± 13.02 1.73E-06	0.85 ± 0.04 2.04E-37	46.69 ± 9.9 1.92E-06	0.88 ± 0.03 1.61E-37	0.87 ± 0.02 1.73E-06

Fine-Tuned (25 pairs)	Mean p value	40.35 ± 12.64 1.61E-37	0.88 ± 0.04 1.73E-06	37.77 ± 8.66 1.61E-37	0.91 ± 0.02 1.73E-06	0.9 ± 0.02 1.61E-37	44.95 ± 12.2 1.73E-06	0.87 ± 0.04 1.61E-37	42.75 ± 9.58 1.73E-06	0.89 ± 0.02 1.61E-37	0.89 ± 0.02 1.73E-06

Supervised (25 pairs)	Mean p value	40.31 ± 12.17 1.61E-37	0.88 ± 0.04 1.73E-06	36.88 ± 8.64 1.61E-37	0.91 ± 0.02 1.73E-06	0.9 ± 0.02 1.61E-37	45.51 ± 12.66 1.73E-06	0.87 ± 0.04 1.61E-37	42.22 ± 9.85 1.73E-06	0.89 ± 0.02 1.61E-37	0.89 ± 0.02 1.73E-06

Fine-Tuned (50 pairs)	Mean p value	36.99 ± 12.02 1.61E-37	0.89 ± 0.04 1.73E-06	34.19 ± 8.94 1.61E-37	0.91 ± 0.02 1.73E-06	0.9 ± 0.02 1.61E-37	42.68 ± 12.37 1.73E-06	0.88 ± 0.04 1.61E-37	39.74 ± 9.81 1.73E-06	0.9 ± 0.02 1.61E-37	0.89 ± 0.02 1.73E-06

Supervised (50 pairs)	Mean p value	37.98 ± 11.96 1.61E-37	0.89 ± 0.04 1.73E-06	34.51 ± 8.89 1.61E-37	0.91 ± 0.02 1.73E-06	0.9 ± 0.02 1.61E-37	43.13 ± 12.5 1.73E-06	0.87 ± 0.04 1.61E-37	39.78 ± 9.96 1.73E-06	0.9 ± 0.02 1.61E-37	0.89 ± 0.02 1.73E-06

Fine-Tuned (100 pairs)	Mean p value	35.56 ± 10.28 1.61E-37	0.9 ± 0.03 1.73E-06	32.97 ± 8.41 1.61E-37	0.92 ± 0.02 1.73E-06	0.9 ± 0.02 1.61E-37	40.71 ± 11.45 1.73E-06	0.88 ± 0.04 1.61E-37	38.57 ± 9.62 1.73E-06	0.9 ± 0.02 1.61E-37	0.89 ± 0.02 1.73E-06

Supervised (100 pairs)	Mean p value	35.49 ± 10.18 1.61E-37	0.9 ± 0.03 1.73E-06	33.1 ± 8.36 1.61E-37	0.92 ± 0.02 1.73E-06	0.91 ± 0.02 1.61E-37	40.78 ± 11.68 1.73E-06	0.88 ± 0.04 1.61E-37	38.56 ± 9.43 1.73E-06	0.9 ± 0.02 1.61E-37	0.9 ± 0.02 1.73E-06

Open in a new tab

Fig. 5. — Violin plots of the MAE values for the normal-appearing and abnormal cases. The unsupervised model showed the worst performance while the Fined-Tuned and supervised models with 100 pairs performed the best. It can also be seen that the unsupervised training with 10 and 25 pairs performed better than the unsupervised with 2061 unregistered data.

Fig. 6. — Violin plots of the Dice coefficient scores and SSIM for normal-appearing and abnormal cases. The Fined-Tuned and supervised model with more than 25 pairs performed well for both the abnormal and normal cases. For the normal cases, the Dice score of 0.9 or above 0.9 is observed for both the bone-regions and soft-tissue. Supervised Training with the large unregistered data set shows the lowest Dice score and SSIM values.

For the case of Fine-Tuned models, it can be observed that using just one spatially registered data set for fine-tuning of the unsupervised model trained with spatially unregistered data improved the MAE, Dice score and SSIM of the evaluated subjects. The bone region improved by 10.6% and 10.8% for the normal and abnormal cases, respectively presented in the third row of Table 1. Fine-Tuning with more spatially registered data sets led to increasingly better results where using 100 pairs led to the Dice score above 0.9 for the both normal and abnormal cases. Notably, Supervised Training had nearly similar results to the Fine-Tuned training, and it is difficult to distinguish between their results. As can be seen in Figs. 5 and 6, the unsupervised model has the largest widest range of metrics among all trained models.

Example performance of the different models in normal cases are depicted in Fig. 4. Although the unsupervised model (with 2061 unregistered subjects) appears to perform well in most regions, some artifacts can be observed in bone regions. As it was observed in Table 1 and Figs. 5 and 6, it can also be seen that the images from the Fined-Tuned model with one pair are more real-looking compared to the unsupervised model, especially in the bone-containing regions. It also can be observed that the images from Fined-Tuned and Supervised model with 100 pairs provide the most realistic results and their differences are negligible.

3.2. Qualitative evaluation

The radiologist successfully identified the real CT for all 60 test subjects. None of the synthetic CT images were considered as diagnostic-quality if used alone. Even the most similar images, although likely adequate for radiation treatment planning and attenuation correction purposes based on quantitative findings, contained subtle differences compared to the real CT that were considered as significant enough to render the images “non-diagnostic” Fig. 7. Among all sCT series, the supervised and Fined-Tuned model with 100 pairs series were ranked as first and second highest with respective average (standard deviation) rank scores of 2.20(0.48) and 2.83(0.37) for the normal cases, and second and first with respective average (standard deviation) of 2.73(0.69) and 2.4667(0.57) for the abnormal cases. The Fined-Tuned model with 1 pair and the unsupervised model were found to be least similar to the real CT series, with respective average (std) rank scores of 4.16(0.74) and 4.70(0.46) for the abnormal cases, and 4.36(0.55) and 4.60(0.49) for the normal cases.

Fig. 7. — Example images from different models for two abnormal cases **Top row**: Axial MRI and real CT images demonstrate posterior left convexity craniotomy bone defects (white arrows). Both synthetic CT images obtained from the Unsupervised Training with 2061 data sets and the Fine-Tuned training with 1 spatially-registered pair fail to depict the defects. sCT obtained from the unsupervised training with 2061 data sets demonstrates an erroneous configuration of the right temporal bone (red arrow), and the sCT obtained from the Fine-Tuned training with 1 pair demonstrates an erroneously thickened calvarium and hyperattenuating/bone-like intracalvarial extensions (yellow arrow). The sCT obtained from the Fined-Tuned model with 100 pairs and also the Supervised Training with 100 pair depict the calvarial contours more accurately compared to the Unsupervised Training with 2061 data sets and the Fine-Tuned Training with 1 pair, with faint foci of hypoattenuation that correspond with craniotomy bone defects (blue and green arrows), although not as clear as seen on the real CT. **Bottom row**: Axial MR and CT images demonstrate prior right facial/skull base surgery including right enucleation. Both synthetic CT images obtained from the Unsupervised Training with 2061 data sets and the Fine-Tuned Training with 1 pair erroneously predict presence of bony structures at the enucleation site (red arrows); the sCT from the Fine-Tuned Training with 1 pair fails to demonstrate additional erroneous hyperattenuating/bone-like intracalvarial extensions (yellow arrow). The sCT from the Fined-Tuned model with 100 pairs and also the Supervised Training with 100 pairs do depict the surgical defect with much favorable accuracy.

Fig. 8 demonstrates how adding one set of registered (paired) data set improves both the delineation of pathology as well as the generation of normal soft tissues, such as the brain parenchyma. Furthermore, the addition of more registered datasets resulted in even more quantitatively accurate sCTs.

4. Discussion

Generation of sCT from MR images has tremendous clinical value; while the sCT may be considered as “non-diagnostic” for clinical purposes, accurate sCT images can be considered as quantitatively adequate in the setting of radiotherapy or for attenuation correction for PET/MRI. Reliable sCT generation can reduce cost and radiation exposure, improve workflow, reduce delays, and minimize errors due to misregistration of separately-acquired CT and MR images.

We evaluated the performance of the cycleGAN architecture in generation of sCT from MR images using different training sets that included various combinations of over two thousand spatially unregistered CT and MR image data sets as well as up to 100 spatially registered data sets, including both normal and abnormal exams. Overall, models with any level of supervised training performed superior to the fully unsupervised training model. Moreover, models with higher degree of supervision (i.e., higher number of spatially registered/pair data sets) outperformed those with less supervision. This observation is consistent with the concept that unsupervised training using the dual-learning GAN methods lacks the pixel-wise supervision in the target domain; this phenomenon can lead to undesirable changes in the content of transferred images (Zhang et al., 2019; Cohen et al., 2018). Hence, although the cyclic images may perfectly match the input, the translated image may not include the intrinsic features of the input. Conversely, the addition of spatially registered data sets, as done in this work, improves the output. CycleGAN performs well for one-to-one mapping (Almahairi et al., 2018) when the two domain have the same intrinsic dimensions, however, the MRI to CT translation cannot be considered as one-to-one translation.

There are several approaches such as Almahairi et al. (2018), Zhou et al. (2017), Zhang et al. (2019) and Chen et al. (2021) to address these limitation of CycleGAN, however fine-tuning with a set of paired data if available can be considered to improve the results. It can be observed that Fine-Tuning the unsupervised trained model with just one spatially registered paired data set can improve the performance significantly. This suggests that if just a small number of spatially-registered/paired data sets are available, they should be utilized to fine-tune a CycleGAN trained with spatially unregistered data sets to yield superior results. That said, the addition of additional spatially registered data sets led to the further improvement of the results.

The overall performance of the training models was superior in normal cases compared to those with pathology. This observation was not unexpected, as normal cases composed the majority of the training set, which is a limitation of this study. Given the observation that some level of supervised training generated the best outcomes, it is reasonable to hypothesize that for an abnormality to appear accurately on sCT (e.g., a fracture), the training set should include paired/registered image sets that include examples of that class of abnormality. Subsequently, for a model to generate accurate sCT images that predict the wide variety of abnormalities seen in head imaging using the described model in this work, a large data set of patients with several presentations of various pathologies must be used. However, the generation of such data set is not a trivial task and is not fully evaluated herein.

The observed superior performance of all models in prediction of soft-tissue compared to bone (through quantitative metrics) may be due to better depiction of soft-tissue than bone by input MR images, which is an inherent limitation of routine clinical MR imaging techniques. Another limitation of this study is presence of artifacts in the unpaired training sets. It is anticipated that artifacts within the unpaired training data set contributed to the lower performance of the unsupervised model, especially when the spatially registered data sets did not contain artifacts. However, it is unlikely that artifacts were the sole contributor to lower performance of the unsupervised training model. It should also be noted that this work is limited to a single scan from a single field strength 1.5T from a single vendor. However, since Fine-Tuning showed its ability to improve the result significantly, it can likely be used to generalize the model trained for a specific vendor to other vendors or field strengths. The results also suggests that spatial misalignment between the MR and CT images decreases the capability of the model. It can also be seen from the results presented herein that a spatially registered training dataset yielded superior performance compared to unregistered, unsupervised data. Moreover, for fine-tuned and supervised training models with more than 25 paired subjects, the difference between the fine-tuned training and supervised training is almost negligible. This suggests that if enough paired subjects are available, the supervised training can potentially be an acceptable option for sCT generation. These findings are concordant with previously published works utilizing smaller data sets.

The quantitative results obtained herein compare favorably with previous studies that have used CycleGAN for sCT generation in the brain. For example, Yang et al. (2018) noticed that although the reconstructed MR image is almost identical to the input MR image (cycle consistency), the synthetic CT image is different from what expected (specially for the skull region). Their results imply that the supervised training led to a superior performance compared to the unsupervised training even with the modification applied on the CycleGAN. Moreover, Li et al. (2020) reported superior performance of a U-Net compared to CycleGAN in generation of synthetic MR images from CT. Later, T1-weighted MR images and CT images were acquired from 173 NPC patients and used in Peng et al. (2020) to train a cGAN (conditional generative adversarial network) and cycleGAN which demonstrated a better performance of cGAN.

5. Conclusion

Although quantitative metrics suggested a promising performance for fined-tuned and supervised models for sCT generation, these models are still limited by the need for availability of spatially registered MRI and CT images. Furthermore, although more than 2000 unpaired cases created a large sample size to assess the performance of the unsupervised training model, this work did not evaluate larger samples sizes (e.g., data set size of $n = 10,000$ or 100,000). It was demonstrated that the number of spatially registered data sets plays a significant role in the quality of sCT images, and a large spatially unregistered data set does not necessarily overcome the absence of registered data sets, i.e., quality is favored over quantity. Note that obtaining registered data sets is even more difficult and less practical for body regions other than the head, particularly in the chest and abdomen.

In conclusion, this study shows the importance of including spatially registered images for the training purposes. The trained models were evaluated on a large test set including a variety of normal and abnormal cases and showed the good metric results for the normal cases as well as the abnormal test case. It was observed that if a small set of registered paired images are available and can be included in the training either for the fine-tuned or supervised training, the performance can even be superior to when the model is trained with a large spatially unregistered (unpaired) data set. This can be applicable for other body parts where only a small set of spatially registered (paired) data can be obtained and acquiring a large spatially registered (paired) data set is not feasible. Therefore, as explored in this paper, even a small number of spatially registered images can improve the results significantly.

Fig. 9. — Additional illustrations of how various models performed for specific abnormal cases, from left to right: (a) MR, (b) Real CT, (c) Unsupervised, (d) Fine-Tuned with 1 pair, (e) Fine-Tuned with 100 pairs, (f) Supervised with 100 pairs.

Fig. 10. — Additional illustrations of how various models performed for specific abnormal cases, from left to right: (a) MR, (b) Real CT, (c) Unsupervised, (d) Fine-Tuned with 1 pair, (e) Fine-Tuned with 100 pairs, (f) Supervised with 100 pairs.

Acknowledgments

Research reported in this publication was supported by the National Institute of Biomedical Imaging and Bioengineering (NIBIB) of the National Institutes of Health under award number R01EB026708. Ali Pirasteh is in part supported by grants KL2TR002374 and UL1TR002373 awarded to UW ICTR through NIH NCATS.

Footnotes

Declaration of competing interest

Ali Pirasteh reports a relationship with TheraCea that includes: consulting or advisory. Ali Pirasteh reports a relationship with Sanofi Genzyme that includes: consulting or advisory. The Department of Radiology at the University of Wisconsin School of Medicine and Public Health receives research support from GE Healthcare.

CRediT authorship contribution statement

Seyed Iman Zare Estakhraji: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Writing – original draft, Writing – review & editing, Visualization. Ali Pirasteh: Study Design, Data analysis, Manuscript edit. Tyler Bradshaw: Study design, Manuscript edit. Alan McMillan: Conceptualization, Methodology, Study design, Data acquisition and analysis, Manuscript preparation/edit.

Appendix. Supplemental

Some examples of the performance of different models for the abnormal cases are presented in Figs. 9 and 10.

Data availability

The authors do not have permission to share data

References

Abraham Alexandre, Pedregosa Fabian, Eickenberg Michael, Gervais Philippe, Mueller Andreas, Kossaifi Jean, Gramfort Alexandre, Thirion Bertrand, Varoquaux Gaël, 2014. Machine learning for neuroimaging with scikit-learn. Front. Neuroinform. 8, 14. [DOI] [PMC free article] [PubMed] [Google Scholar]
Almahairi Amjad, Rajeshwar Sai, Sordoni Alessandro, Bachman Philip, Courville Aaron, 2018. Augmented cyclegan: Learning many-to-many mappings from unpaired data. In: International Conference on Machine Learning. PMLR, pp. 195–204. [Google Scholar]
Boulanger M, Nunes Jean-Claude, Chourak H, Largent A, Tahri S, Acosta O, De Crevoisier R, Lafond C, Barateau A, 2021. Deep learning methods to generate synthetic CT from MRI in radiotherapy: A literature review. Phys. Med. 89, 265–281. [DOI] [PubMed] [Google Scholar]
Chen Hongqian, Guan Mengxi, Li Hui, 2021. ArCycleGAN: Improved cyclegan for style transferring of fruit images. IEEE Access 9, 46776–46787. [Google Scholar]
Chen Shupeng, Qin An, Zhou Dingyi, Yan Di, 2018. U-net-generated synthetic CT images for magnetic resonance imaging-only prostate intensity-modulated radiation therapy treatment planning. Med. Phys. 45 (12), 5659–5665. [DOI] [PubMed] [Google Scholar]
Cohen Joseph Paul, Luck Margaux, Honari Sina, 2018. Distribution matching losses can hallucinate features in medical image translation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st International Conference, Granada, Spain, September 16–20, 2018, Proceedings, Part I. Springer, pp. 529–536. [Google Scholar]
Consortium The MONAI, 2020. Project MONAI. 10.5281/zenodo.4323059. [DOI] [Google Scholar]
Devic Slobodan, 2012. MRI simulation for radiotherapy treatment planning. Med. Phys. 39 (11), 6701–6711. [DOI] [PubMed] [Google Scholar]
Edmund Jens M., Nyholm Tufve, 2017. A review of substitute CT generation for MRI-only radiation therapy. Radiat. Oncol. 12 (1), 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
Emami Hajar, Dong Ming, Nejad-Davarani Siamak P, Glide-Hurst Carri K, 2018. Generating synthetic CTs from magnetic resonance images using generative adversarial networks. Med. Phys. 45 (8), 3627–3636. [DOI] [PMC free article] [PubMed] [Google Scholar]
Goodfellow Ian, Pouget-Abadie Jean, Mirza Mehdi, Xu Bing, Warde-Farley David, Ozair Sherjil, Courville Aaron, Bengio Yoshua, 2014. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27. [Google Scholar]
He Di, Xia Yingce, Qin Tao, Wang Liwei, Yu Nenghai, Liu Tie-Yan, Ma Wei-Ying, 2016. Dual learning for machine translation. Adv. Neural Inf. Process. Syst. 29. [Google Scholar]
Isola Phillip, Zhu Jun-Yan, Zhou Tinghui, Efros Alexei A, 2017. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 1125–1134. [Google Scholar]
Jabbarpour Amir, Mahdavi Seied Rabi, Sadr Alireza Vafaei, Esmaili Golbarg, Shiri Isaac, Zaidi Habib, 2022. Unsupervised pseudo CT generation using heterogenous multicentric CT/MR images and CycleGAN: Dosimetric assessment for 3D conformal radiotherapy. Comput. Biol. Med. 143, 105277. [DOI] [PubMed] [Google Scholar]
Jang Hyungseok, Liu Fang, Zhao Gengyan, Bradshaw Tyler, McMillan Alan B, 2018. Deep learning based MRAC using rapid ultrashort echo time imaging. Med. Phys. 45 (8), 3697–3704. [DOI] [PMC free article] [PubMed] [Google Scholar]
Johnstone Emily, Wyatt Jonathan J, Henry Ann M, Short Susan C, Sebag-Montefiore David, Murray Louise, Kelly Charles G, McCallum Hazel M, Speight Richard, 2018. Systematic review of synthetic computed tomography generation methodologies for use in magnetic resonance imaging–only radiation therapy. Int. J. Radiat. Oncol.* Biol.* Phys. 100 (1), 199–217. [DOI] [PubMed] [Google Scholar]
Jonsson Joakim, Nyholm Tufve, Söderkvist Karin, 2019. The rationale for MR-only treatment planning for external radiotherapy. Clin. Transl. Radiat. Oncol. 18, 60–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
Karlsson Mikael, Karlsson Magnus G, Nyholm Tufve, Amies Christopher, Zackrisson Björn, 2009. Dedicated magnetic resonance imaging in the radiotherapy clinic. Int. J. Radiat. Oncol.* Biol.* Phys. 74 (2), 644–651. [DOI] [PubMed] [Google Scholar]
Kazemifar Samaneh, McGuire Sarah, Timmerman Robert, Wardak Zabi, Nguyen Dan, Park Yang, Jiang Steve, Owrangi Amir, 2019. MRI-only brain radiotherapy: Assessing the dosimetric accuracy of synthetic CT images generated using a deep learning approach. Radiother. Oncol. 136, 56–63. [DOI] [PubMed] [Google Scholar]
Kerkmeijer LGW, Maspero M, Meijer GJ, van Zyp JRN van der Voort, De Boer HCJ, van den Berg CAT, 2018. Magnetic resonance imaging only workflow for radiotherapy simulation and planning in prostate cancer. Clin. Oncol. 30 (11), 692–701. [DOI] [PubMed] [Google Scholar]
Khoo VS, Joon DL, 2006. New developments in MRI for target volume delineation in radiotherapy. Br. J. Radiol. 79 (special_issue_1), S2–S15. [DOI] [PubMed] [Google Scholar]
Kim Taeksoo, Cha Moonsu, Kim Hyunsoo, Lee Jung Kwon, Kim Jiwon, 2017. Learning to discover cross-domain relations with generative adversarial networks. In: International Conference on Machine Learning. PMLR, pp. 1857–1865. [Google Scholar]
Klages Peter, Benslimane Ilyes, Riyahi Sadegh, Jiang Jue, Hunt Margie, Deasy Joseph O, Veeraraghavan Harini, Tyagi Neelam, 2020. Patch-based generative adversarial neural network models for head and neck MR-only planning. Med. Phys. 47 (2), 626–642. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lei Yang, Harms Joseph, Wang Tonghe, Liu Yingzi, Shu Hui-Kuo, Jani Ashesh B, Curran Walter J, Mao Hui, Liu Tian, Yang Xiaofeng, 2019. MRI-only based synthetic CT generation using dense cycle consistent generative adversarial networks. Med. Phys. 46 (8), 3565–3581. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li Wen, Li Yafen, Qin Wenjian, Liang Xiaokun, Xu Jianyang, Xiong Jing, Xie Yaoqin, 2020. Magnetic resonance image (MRI) synthesis from brain computed tomography (CT) images based on deep learning methods for magnetic resonance (MR)-guided radiotherapy. Quant. Imaging Med. Surg. 10 (6), 1223. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu Fang, Jang Hyungseok, Kijowski Richard, Zhao Gengyan, Bradshaw Tyler, McMillan Alan B, 2018. A deep learning approach for 18 F-FDG PET attenuation correction. EJNMMI Phys. 5 (1), 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu Fang, Yadav Poonam, Baschnagel Andrew M, McMillan Alan B, 2019. MR-based treatment planning in radiation therapy using a deep learning approach. J. Appl. Clin. Med. Phys. 20 (3), 105–114. [DOI] [PMC free article] [PubMed] [Google Scholar]
Maspero Matteo, Savenije Mark HF, Dinkla Anna M, Seevinck Peter R, Intven Martijn PW, Jurgenliemk-Schulz Ina M, Kerkmeijer Linda GW, van den Berg Cornelis AT, 2018. Dose evaluation of fast synthetic-CT generation using a generative adversarial network for general pelvis MR-only radiotherapy. Phys. Med. Biol. 63 (18), 185001. [DOI] [PubMed] [Google Scholar]
McMillan Alan B., Bradshaw Tyler J., 2021. Artificial intelligence–based data corrections for attenuation and scatter in position emission tomography and single-photon emission computed tomography. PET Clin. 16 (4), 543–552. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nie Dong, Trullo Roger, Lian Jun, Wang Li, Petitjean Caroline, Ruan Su, Wang Qian, Shen Dinggang, 2018. Medical image synthesis with deep convolutional adversarial networks. IEEE Trans. Biomed. Eng. 65 (12), 2720–2730. [DOI] [PMC free article] [PubMed] [Google Scholar]
Peng Yinglin, Chen Shupeng, Qin An, Chen Meining, Gao Xingwang, Liu Yimei, Miao Jingjing, Gu Huikuan, Zhao Chong, Deng Xiaowu, et al. , 2020. Magnetic resonance-based synthetic computed tomography images generated using generative adversarial networks for nasopharyngeal carcinoma radiotherapy treatment planning. Radiother. Oncol. 150, 217–224. [DOI] [PubMed] [Google Scholar]
Price Ryan G, Kim Joshua P, Zheng Weili, Chetty Indrin J, Glide-Hurst Carri, 2016. Image guided radiation therapy using synthetic computed tomography images in brain cancer. Int. J. Radiat. Oncol.* Biol.* Phys. 95 (4), 1281–1289. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schmidt Maria A., Payne Geoffrey S., 2015. Radiotherapy planning using MRI. Phys. Med. Biol. 60 (22), R323. [DOI] [PMC free article] [PubMed] [Google Scholar]
Spadea Maria Francesca, Maspero Matteo, Zaffino Paolo, Seco Joao, 2021. Deep learning based synthetic-CT generation in radiotherapy and PET: A review. Med. Phys. 48 (11), 6537–6566. [DOI] [PubMed] [Google Scholar]
Thrall James H, Li Xiang, Li Quanzheng, Cruz Cinthia, Do Synho, Dreyer Keith, Brink James, 2018. Artificial intelligence and machine learning in radiology: opportunities, challenges, pitfalls, and criteria for success. J. Am. College Radiol. 15 (3), 504–508. [DOI] [PubMed] [Google Scholar]
Tustison Nicholas J, Cook Philip A, Holbrook Andrew J, Johnson Hans J, Muschelli John, Devenyi Gabriel A, Duda Jeffrey T, Das Sandhitsu R, Cullen Nicholas C, Gillen Daniel L, et al. , 2021. The ANTsX ecosystem for quantitative biological and medical imaging. Sci. Rep. 11 (1), 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
van der Walt Stéfan, Schönberger Johannes L., Nunez-Iglesias Juan, Boulogne François, Warner Joshua D., Yager Neil, Gouillart Emmanuelle, Yu Tony, the scikit-image contributors, 2014. Scikit-image: image processing in python. PeerJ 2, e453. 10.7717/peerj.453. [DOI] [PMC free article] [PubMed] [Google Scholar]
Virtanen Pauli, Gommers Ralf, Oliphant Travis E., Haberland Matt, Reddy Tyler, Cournapeau David, Burovski Evgeni, Peterson Pearu, Weckesser Warren, Bright Jonathan, van der Walt Stéfan J., Brett Matthew, Wilson Joshua, Millman, Mayorov Nikolay, Nelson Andrew R.J., Jones Eric, Kern Robert, Larson Eric, Carey CJ, Polat İlhan, Feng Yu, Moore Eric W., VanderPlas Jake, Laxalde Denis, Perktold Josef, Cimrman Robert, Henriksen Ian, Quintero EA, Harris Charles R., Archibald Anne M., Ribeiro Antônio H., Pedregosa Fabian, van Mulbregt Paul, SciPy 1.0 Contributors, 2020. SciPy 1.0: Fundamental algorithms for scientific computing in python. Nature Methods 17, 261–272. 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wolterink Jelmer M, Dinkla Anna M, Savenije Mark HF, Seevinck Peter R, van den Berg Cornelis AT, Išgum Ivana, 2017. Deep MR to CT synthesis using unpaired data. In: International Workshop on Simulation and Synthesis in Medical Imaging. Springer, pp. 14–23. [Google Scholar]
Xiang Lei, Li Yang, Lin Weili, Wang Qian, Shen Dinggang, 2018. Unpaired deep cross-modality synthesis with fast training. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Springer, pp. 155–164. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang Heran, Sun Jian, Carass Aaron, Zhao Can, Lee Junghoon, Xu Zongben, Prince Jerry, 2018. Unpaired brain MR-to-CT synthesis using a structure-constrained CycleGAN. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Springer, pp. 174–182. [Google Scholar]
Yi Zili, Zhang Hao, Tan Ping, Gong Minglun, 2017. Dualgan: Unsupervised dual learning for image-to-image translation. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 2849–2857. [Google Scholar]
Zhang Rui, Pfister Tomas, Li Jia, 2019. Harmonic unpaired image-to-image translation. arXiv preprint arXiv:1902.09727. [Google Scholar]
Zhou Shuchang, Xiao Taihong, Yang Yi, Feng Dieqiao, He Qinyao, He Weiran, 2017. Genegan: Learning object transfiguration and attribute subspace from unpaired data. arXiv preprint arXiv:1705.04932. [Google Scholar]
Zhu Jun-Yan, Park Taesung, Isola Phillip, Efros Alexei A, 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 2223–2232. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The authors do not have permission to share data

[R1] Abraham Alexandre, Pedregosa Fabian, Eickenberg Michael, Gervais Philippe, Mueller Andreas, Kossaifi Jean, Gramfort Alexandre, Thirion Bertrand, Varoquaux Gaël, 2014. Machine learning for neuroimaging with scikit-learn. Front. Neuroinform. 8, 14. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Almahairi Amjad, Rajeshwar Sai, Sordoni Alessandro, Bachman Philip, Courville Aaron, 2018. Augmented cyclegan: Learning many-to-many mappings from unpaired data. In: International Conference on Machine Learning. PMLR, pp. 195–204. [Google Scholar]

[R3] Boulanger M, Nunes Jean-Claude, Chourak H, Largent A, Tahri S, Acosta O, De Crevoisier R, Lafond C, Barateau A, 2021. Deep learning methods to generate synthetic CT from MRI in radiotherapy: A literature review. Phys. Med. 89, 265–281. [DOI] [PubMed] [Google Scholar]

[R4] Chen Hongqian, Guan Mengxi, Li Hui, 2021. ArCycleGAN: Improved cyclegan for style transferring of fruit images. IEEE Access 9, 46776–46787. [Google Scholar]

[R5] Chen Shupeng, Qin An, Zhou Dingyi, Yan Di, 2018. U-net-generated synthetic CT images for magnetic resonance imaging-only prostate intensity-modulated radiation therapy treatment planning. Med. Phys. 45 (12), 5659–5665. [DOI] [PubMed] [Google Scholar]

[R6] Cohen Joseph Paul, Luck Margaux, Honari Sina, 2018. Distribution matching losses can hallucinate features in medical image translation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st International Conference, Granada, Spain, September 16–20, 2018, Proceedings, Part I. Springer, pp. 529–536. [Google Scholar]

[R7] Consortium The MONAI, 2020. Project MONAI. 10.5281/zenodo.4323059. [DOI] [Google Scholar]

[R8] Devic Slobodan, 2012. MRI simulation for radiotherapy treatment planning. Med. Phys. 39 (11), 6701–6711. [DOI] [PubMed] [Google Scholar]

[R9] Edmund Jens M., Nyholm Tufve, 2017. A review of substitute CT generation for MRI-only radiation therapy. Radiat. Oncol. 12 (1), 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Emami Hajar, Dong Ming, Nejad-Davarani Siamak P, Glide-Hurst Carri K, 2018. Generating synthetic CTs from magnetic resonance images using generative adversarial networks. Med. Phys. 45 (8), 3627–3636. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Goodfellow Ian, Pouget-Abadie Jean, Mirza Mehdi, Xu Bing, Warde-Farley David, Ozair Sherjil, Courville Aaron, Bengio Yoshua, 2014. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27. [Google Scholar]

[R12] He Di, Xia Yingce, Qin Tao, Wang Liwei, Yu Nenghai, Liu Tie-Yan, Ma Wei-Ying, 2016. Dual learning for machine translation. Adv. Neural Inf. Process. Syst. 29. [Google Scholar]

[R13] Isola Phillip, Zhu Jun-Yan, Zhou Tinghui, Efros Alexei A, 2017. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 1125–1134. [Google Scholar]

[R14] Jabbarpour Amir, Mahdavi Seied Rabi, Sadr Alireza Vafaei, Esmaili Golbarg, Shiri Isaac, Zaidi Habib, 2022. Unsupervised pseudo CT generation using heterogenous multicentric CT/MR images and CycleGAN: Dosimetric assessment for 3D conformal radiotherapy. Comput. Biol. Med. 143, 105277. [DOI] [PubMed] [Google Scholar]

[R15] Jang Hyungseok, Liu Fang, Zhao Gengyan, Bradshaw Tyler, McMillan Alan B, 2018. Deep learning based MRAC using rapid ultrashort echo time imaging. Med. Phys. 45 (8), 3697–3704. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] Johnstone Emily, Wyatt Jonathan J, Henry Ann M, Short Susan C, Sebag-Montefiore David, Murray Louise, Kelly Charles G, McCallum Hazel M, Speight Richard, 2018. Systematic review of synthetic computed tomography generation methodologies for use in magnetic resonance imaging–only radiation therapy. Int. J. Radiat. Oncol.* Biol.* Phys. 100 (1), 199–217. [DOI] [PubMed] [Google Scholar]

[R17] Jonsson Joakim, Nyholm Tufve, Söderkvist Karin, 2019. The rationale for MR-only treatment planning for external radiotherapy. Clin. Transl. Radiat. Oncol. 18, 60–65. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] Karlsson Mikael, Karlsson Magnus G, Nyholm Tufve, Amies Christopher, Zackrisson Björn, 2009. Dedicated magnetic resonance imaging in the radiotherapy clinic. Int. J. Radiat. Oncol.* Biol.* Phys. 74 (2), 644–651. [DOI] [PubMed] [Google Scholar]

[R19] Kazemifar Samaneh, McGuire Sarah, Timmerman Robert, Wardak Zabi, Nguyen Dan, Park Yang, Jiang Steve, Owrangi Amir, 2019. MRI-only brain radiotherapy: Assessing the dosimetric accuracy of synthetic CT images generated using a deep learning approach. Radiother. Oncol. 136, 56–63. [DOI] [PubMed] [Google Scholar]

[R20] Kerkmeijer LGW, Maspero M, Meijer GJ, van Zyp JRN van der Voort, De Boer HCJ, van den Berg CAT, 2018. Magnetic resonance imaging only workflow for radiotherapy simulation and planning in prostate cancer. Clin. Oncol. 30 (11), 692–701. [DOI] [PubMed] [Google Scholar]

[R21] Khoo VS, Joon DL, 2006. New developments in MRI for target volume delineation in radiotherapy. Br. J. Radiol. 79 (special_issue_1), S2–S15. [DOI] [PubMed] [Google Scholar]

[R22] Kim Taeksoo, Cha Moonsu, Kim Hyunsoo, Lee Jung Kwon, Kim Jiwon, 2017. Learning to discover cross-domain relations with generative adversarial networks. In: International Conference on Machine Learning. PMLR, pp. 1857–1865. [Google Scholar]

[R23] Klages Peter, Benslimane Ilyes, Riyahi Sadegh, Jiang Jue, Hunt Margie, Deasy Joseph O, Veeraraghavan Harini, Tyagi Neelam, 2020. Patch-based generative adversarial neural network models for head and neck MR-only planning. Med. Phys. 47 (2), 626–642. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] Lei Yang, Harms Joseph, Wang Tonghe, Liu Yingzi, Shu Hui-Kuo, Jani Ashesh B, Curran Walter J, Mao Hui, Liu Tian, Yang Xiaofeng, 2019. MRI-only based synthetic CT generation using dense cycle consistent generative adversarial networks. Med. Phys. 46 (8), 3565–3581. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] Li Wen, Li Yafen, Qin Wenjian, Liang Xiaokun, Xu Jianyang, Xiong Jing, Xie Yaoqin, 2020. Magnetic resonance image (MRI) synthesis from brain computed tomography (CT) images based on deep learning methods for magnetic resonance (MR)-guided radiotherapy. Quant. Imaging Med. Surg. 10 (6), 1223. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] Liu Fang, Jang Hyungseok, Kijowski Richard, Zhao Gengyan, Bradshaw Tyler, McMillan Alan B, 2018. A deep learning approach for 18 F-FDG PET attenuation correction. EJNMMI Phys. 5 (1), 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] Liu Fang, Yadav Poonam, Baschnagel Andrew M, McMillan Alan B, 2019. MR-based treatment planning in radiation therapy using a deep learning approach. J. Appl. Clin. Med. Phys. 20 (3), 105–114. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] Maspero Matteo, Savenije Mark HF, Dinkla Anna M, Seevinck Peter R, Intven Martijn PW, Jurgenliemk-Schulz Ina M, Kerkmeijer Linda GW, van den Berg Cornelis AT, 2018. Dose evaluation of fast synthetic-CT generation using a generative adversarial network for general pelvis MR-only radiotherapy. Phys. Med. Biol. 63 (18), 185001. [DOI] [PubMed] [Google Scholar]

[R29] McMillan Alan B., Bradshaw Tyler J., 2021. Artificial intelligence–based data corrections for attenuation and scatter in position emission tomography and single-photon emission computed tomography. PET Clin. 16 (4), 543–552. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] Nie Dong, Trullo Roger, Lian Jun, Wang Li, Petitjean Caroline, Ruan Su, Wang Qian, Shen Dinggang, 2018. Medical image synthesis with deep convolutional adversarial networks. IEEE Trans. Biomed. Eng. 65 (12), 2720–2730. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] Peng Yinglin, Chen Shupeng, Qin An, Chen Meining, Gao Xingwang, Liu Yimei, Miao Jingjing, Gu Huikuan, Zhao Chong, Deng Xiaowu, et al. , 2020. Magnetic resonance-based synthetic computed tomography images generated using generative adversarial networks for nasopharyngeal carcinoma radiotherapy treatment planning. Radiother. Oncol. 150, 217–224. [DOI] [PubMed] [Google Scholar]

[R32] Price Ryan G, Kim Joshua P, Zheng Weili, Chetty Indrin J, Glide-Hurst Carri, 2016. Image guided radiation therapy using synthetic computed tomography images in brain cancer. Int. J. Radiat. Oncol.* Biol.* Phys. 95 (4), 1281–1289. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] Schmidt Maria A., Payne Geoffrey S., 2015. Radiotherapy planning using MRI. Phys. Med. Biol. 60 (22), R323. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] Spadea Maria Francesca, Maspero Matteo, Zaffino Paolo, Seco Joao, 2021. Deep learning based synthetic-CT generation in radiotherapy and PET: A review. Med. Phys. 48 (11), 6537–6566. [DOI] [PubMed] [Google Scholar]

[R35] Thrall James H, Li Xiang, Li Quanzheng, Cruz Cinthia, Do Synho, Dreyer Keith, Brink James, 2018. Artificial intelligence and machine learning in radiology: opportunities, challenges, pitfalls, and criteria for success. J. Am. College Radiol. 15 (3), 504–508. [DOI] [PubMed] [Google Scholar]

[R36] Tustison Nicholas J, Cook Philip A, Holbrook Andrew J, Johnson Hans J, Muschelli John, Devenyi Gabriel A, Duda Jeffrey T, Das Sandhitsu R, Cullen Nicholas C, Gillen Daniel L, et al. , 2021. The ANTsX ecosystem for quantitative biological and medical imaging. Sci. Rep. 11 (1), 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] van der Walt Stéfan, Schönberger Johannes L., Nunez-Iglesias Juan, Boulogne François, Warner Joshua D., Yager Neil, Gouillart Emmanuelle, Yu Tony, the scikit-image contributors, 2014. Scikit-image: image processing in python. PeerJ 2, e453. 10.7717/peerj.453. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] Virtanen Pauli, Gommers Ralf, Oliphant Travis E., Haberland Matt, Reddy Tyler, Cournapeau David, Burovski Evgeni, Peterson Pearu, Weckesser Warren, Bright Jonathan, van der Walt Stéfan J., Brett Matthew, Wilson Joshua, Millman, Mayorov Nikolay, Nelson Andrew R.J., Jones Eric, Kern Robert, Larson Eric, Carey CJ, Polat İlhan, Feng Yu, Moore Eric W., VanderPlas Jake, Laxalde Denis, Perktold Josef, Cimrman Robert, Henriksen Ian, Quintero EA, Harris Charles R., Archibald Anne M., Ribeiro Antônio H., Pedregosa Fabian, van Mulbregt Paul, SciPy 1.0 Contributors, 2020. SciPy 1.0: Fundamental algorithms for scientific computing in python. Nature Methods 17, 261–272. 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] Wolterink Jelmer M, Dinkla Anna M, Savenije Mark HF, Seevinck Peter R, van den Berg Cornelis AT, Išgum Ivana, 2017. Deep MR to CT synthesis using unpaired data. In: International Workshop on Simulation and Synthesis in Medical Imaging. Springer, pp. 14–23. [Google Scholar]

[R40] Xiang Lei, Li Yang, Lin Weili, Wang Qian, Shen Dinggang, 2018. Unpaired deep cross-modality synthesis with fast training. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Springer, pp. 155–164. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] Yang Heran, Sun Jian, Carass Aaron, Zhao Can, Lee Junghoon, Xu Zongben, Prince Jerry, 2018. Unpaired brain MR-to-CT synthesis using a structure-constrained CycleGAN. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Springer, pp. 174–182. [Google Scholar]

[R42] Yi Zili, Zhang Hao, Tan Ping, Gong Minglun, 2017. Dualgan: Unsupervised dual learning for image-to-image translation. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 2849–2857. [Google Scholar]

[R43] Zhang Rui, Pfister Tomas, Li Jia, 2019. Harmonic unpaired image-to-image translation. arXiv preprint arXiv:1902.09727. [Google Scholar]

[R44] Zhou Shuchang, Xiao Taihong, Yang Yi, Feng Dieqiao, He Qinyao, He Weiran, 2017. Genegan: Learning object transfiguration and attribute subspace from unpaired data. arXiv preprint arXiv:1705.04932. [Google Scholar]

[R45] Zhu Jun-Yan, Park Taesung, Isola Phillip, Efros Alexei A, 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 2223–2232. [Google Scholar]

PERMALINK

On the effect of training database size for MR-based synthetic CT generation in the head

Seyed Iman Zare Estakhraji

Ali Pirasteh

Tyler Bradshaw

Alan McMillan

Abstract

Purpose:

Materials and methods:

Results:

Conclusion:

1. Introduction

2. Materials and methods

2.1. Data acquisition

2.2. Network architectures

Fig. 1.

Fig. 2.

Fig. 3.

2.3. Network training

2.4. Qualitative image evaluation

3. Results

3.1. Quantitative evaluation

Table 1.

Fig. 5.

Fig. 6.

Fig. 4.

3.2. Qualitative evaluation

Fig. 7.

Fig. 8.

4. Discussion

5. Conclusion

Fig. 9.

Fig. 10.

Acknowledgments

Footnotes

Data availability

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases