Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Apr 17.
Published before final editing as: IEEE Trans Med Imaging. 2018 Oct 17:10.1109/TMI.2018.2876633. doi: 10.1109/TMI.2018.2876633

SynSeg-Net: Synthetic Segmentation Without Target Modality Ground Truth

Yuankai Huo 1,, Zhoubing Xu 2, Hyeonsoo Moon 3, Shunxing Bao 4, Albert Assad 5, Tamara K Moyo 6, Michael R Savona 7, Richard G Abramson 8, Bennett A Landman 9
PMCID: PMC6504618  NIHMSID: NIHMS1526444  PMID: 30334788

Abstract

A key limitation of deep convolutional neural network (DCNN)-based image segmentation methods is the lack of generalizability. Manually traced training images are typically required when segmenting organs in a new imaging modality or from distinct disease cohort. The manual efforts can be alleviated if the manually traced images in one imaging modality (e.g., MRI) are able to train a segmentation network for another imaging modality (e.g., CT). In this paper, we propose an end-to-end synthetic segmentation network (SynSeg-Net) to train a segmentation network for a target imaging modality without having manual labels. SynSeg-Net is trained by using: 1) unpaired intensity images from source and target modalities and 2) manual labels only from source modality. SynSeg-Net is enabled by the recent advances of cycle generative adversarial networks and DCNN. We evaluate the performance of the SynSeg-Net on two experiments: 1) MRI to CT splenomegaly synthetic segmentation for abdominal images and 2) CT to MRI total intracranial volume synthetic segmentation for brain images. The proposed end-to-end approach achieved superior performance to two-stage methods. Moreover, the SynSeg-Net achieved comparable performance to the traditional segmentation network using target modality labels in certain scenarios. The source code of SynSeg-Net is publicly available.1

Keywords: Synthesis, segmentation, splenomegaly, TICV, synthetic segmentation, GAN, adversarial, DCNN, convolutional

I. Introduction

DEEP learning techniques have proven effective for medical image synthesis across (1) different sequencing types within the same image modality (e.g., between T1w, T2w, PD, FLAIR etc.) and (2) different imaging modalities (e.g. MRI to CT, CT to MRI etc.) [1]. While there are impediments to use the synthetic images directly in clinical practice, synthetic images have been shown to be an effective intermediate representation for image processing including registration [2], data augmentation [3], and segmentation [4]. Historically, paired training data for both imaging modalities were typically required for image synthesis. Recent advances with cycle generative adversarial networks (CycleGAN) [5] have demonstrated high quality cross-modality image synthesis without paired data.

In this paper, we propose an end-to-end synthetic segmentation network (SynSeg-Net) to train a DCNN segmentation network without having manual labels on the target imaging modality. The network is trained by unpaired source and target modality images with manual segmentations only on the source modality (Figure 1). This method alleviates the manual segmentation efforts for the medical image analyses by taking the advantage of cross-modality image synthesis learning.

Fig. 1.

Fig. 1.

The proposed synthetic segmentation network (SynSeg-Net) is able to train a CT splenomegaly segmentation network from unpaired MRI and CT training images without using manual CT labels.

To evaluate the segmentation performance of the proposed SynSeg-Net, two experiments were employed. The first experiment performed CT splenomegaly (extraordinary large spleen) synthetic segmentation without having any spleen labels on CT images. The second experiment performed MRI total intracranial volume (TICV) synthetic segmentation without having any TICV labels on MRI. From the empirical validations, the proposed end-to-end approach achieved superior performance to the two stage methods. Moreover, the SynSeg-Net achieved comparable performance to the traditional way of training a segmentation network using target modality labels in certain scenarios. Note that the “comparable performance” in this paper is defined as two methods do not show statistically significant differences on segmentation performance.

This work extends our previous conference paper [6] with the following new efforts: (1) the methodology is presented in greater detail, (2) new external validations (MRI to CT) were provided for CT splenomegaly synthetic segmentation, (3) total intracranial volume segmentation was provided as a new experiment (CT to MRI), and (4) the source code of SynSeg-Net has been made publicly available at https://github.com/MASILab/SynSeg-Net.

II. Related Works

A. Cross-Modality Image Synthesis

Medical image synthesis is defined as the generation of realistic images through learning models [1]. From a technical perspective, image synthesis can be achieved from a generative model (e.g., from noise) or a cross-modality adaptation model (e.g., from MRI to CT). Our work is mostly related to the cross-modality image synthesis approaches, in which a synthetic image in target imaging modality is synthesized from a real image in source imaging modality.

Historically, cross-modality image synthesis methods can be ascribed to three categories (1) registration-based methods, (2) intensity-based methods, and (3) deep learning based methods. The registration-based cross-modality image synthesis methods were inspired by Miller et al. [2], in which the synthetic images were achieved by registering a subject image to a collection of co-registered images. Then, Burgos et al. [7] extended this idea to a multi-atlas information propagation scheme by integrating multi-atlas registration and intensity fusion, and applied on MRI to CT synthesis. Cardoso et al. [8] proposed a variant of this approach by introducing a multi-atlas generative model for image synthesis and outlier detection. The second family of the cross-modality image synthesis approaches is intensity-based methods, whose principle is to learn an intensity transformation function to map source intensities to target intensities [9-16].

Herein, we focus on the third family - deep learning based image synthesis methods. In [17], a location-sensitive deep synthesis method was introduced to utilize the both intensity and spatial information between modalities during training stage. Sevetlidis et al. proposed a deep encoder-decoder network [18] using a patch-based learning fashion. Xiang et al. [19] proposed a deep embedding convolutional neural network, which utilize the intermediate feature maps between MRI and CT scans. Nie et al. [20] proposed a context-aware generative adversarial network to generate CT images from MRI images.

Recently, Goodfellow et al. [21] proposed generative adversarial networks (GANs) that provided a new perspective of image synthesis and domain adaptation in using either paired training images [22] or unpaired images [5]. GAN-based methods have been successfully applied to a variety of computer vision problems [23, 24] and have been adapted to medical imaging community [20, [25-27]. Compared with previous adversarial learning based synthesis method, the cycle consistent loops leads to more representative synthetic images.

B. Synthetic Segmentation

One major application of image synthesis is to leverage segmentation performance. Iglesias et al. [4] demonstrated that the synthesized MRI images could improve the segmentation performance (Figure 2a). Several studies used the adversarial learning as an extra GAN-based supervision on medical image segmentation networks [28-31]. In this study, we focus on the synthetic segmentation, which used the synthetic images as training images to train a segmentation network in target imaging modalities.

Fig. 2.

Fig. 2.

This figure illustrates the prevalent strategies of performing segmentation without ground truth in the target modality. “Mod. S” means source modality images while “Mod. T” represents target modality images. “Syn. 1” is a source to target transformation generator, while “Syn. 2” is a target to source transformation generator. “Seg. T” is the segmentation network for target modality. (a) is the a two-stage framework that considered the synthesis (left side of the red dash line) and segmentation (right side of the red dash line) as two independent training stages. (b) connects the synthesis and segmentation network into an end-to-end fashion. (c) employs the latest CycleGAN framework as the synthesis network for unpaired cross-modality image synthesis (left side of the red dash line), and then performs another independent training stage for segmentation (right side of the red dash line). (d) is the proposed method which integrate the cycle adversarial synthesis and segmentation into a end-to-end framework.

Figure 2 presents the different strategies for synthetic segmentation. Kamnitsas et al. [32] introduced unsupervised domain adaptation for brain lesion segmentation (Figure 2b). It reveals the possibility of training a lesion segmentation network using cross-modality synthetic segmentation. However, (1) the source imaging sequence (GE) and target imaging sequence (SWI) are still from to the same MRI modality. (2) Overlapped image modalities (e.g., FLAIR, T2, PD, MPRAGE) were used in both source and target imaging modalities to ensure performance. Cross-modality synthetic segmentation on two independent imaging mechanisms (e.g. MRI to CT) without having overlapped imaging modalities is appealing.

Recently, the cycle generative adversarial networks (CycleGAN) [5] provided a promising tool for cross-modality synthesis from unpaired training images [33, 34]. With CycleGAN, one is able to synthesize the images for one imaging modality (e.g., MRI) while targeting another imaging modality (e.g., CT). Using CycleGAN, Chartsias et al. [35] proposed an CT to MRI synthesis method, and then trained another independent MRI segmentation network (called “Seg.”) using the synthetic MRI images (Figure 2c). Although still using manual labels for both two modalities, this two-stage framework (we refer to as “CycleGAN+Seg.”) revealed a promising direction of integrating cycle adversarial networks in synthetic segmentation.

Building upon CycleGAN, Zhang et al. [3] and our group [6] proposed end-to-end synthesis and segmentation networks. Zhang et al. [3] focus on leveraging both synthesis and segmentation performance simultaneously using both true images and manual labels on both MRI and CT. Therefore, the manual segmentation on target imaging modalities have still been used. By contrast, Huo et al. [6] introduced the end-to-end synthesis and segmentation network, which designed a synthetic segmentation network without using manual labels in target imaging modality. In this paper, we described such method with more detailed descriptions. Moreover, external validation and new experiments were employed to evaluate the proposed method as well as the baseline methods.

III. Method

Figure 3 introduces the network design, while preprocessing, postprocessing, hyperparameters and the experimental platforms are presented below.

Fig. 3.

Fig. 3.

The upper panel showed the network structure of the proposed SynSeg-Net during training stages. The left side was the CycleGAN synthesis subnet, where S was MRI and T was CT. G1 and G2 were the generators while D1 and D2 were discriminators. The right subnet was the segmentation subnet Seg for an end-to-end training. Loss function were added to optimize the SynSeg-Net. The lower panel showed the network structure of SynSeg-Net during testing stage. Only the trained subnet Seg was used to segment a testing image from target imaging modality.

A. Preprocessing

The intensities of every input MRI scan were normalized to 0-1 scale such that the highest 2.5% and lowest 2.5% intensities were excluded from the normalization to reduce the outliers’ effects. For CT, the voxels whose HU values were greater than 1000 were set to 1000, whose HU values were less than −1000 were set to −1000. Then, the intensities between −1000 to 1000 were normalized to 0-1 scale. Next, the axial slices from normalized intensity image volume (both MRI and CT) were resampled to 256×256 using bilinear interpolation, while the corresponding segmentation axial slices were resampled to the same resolution using nearest neighbor interpolation. Hence, the same image dimensions (256 × 256) were match for both modalities, following CycleGAN [5].

B. SynSeg-Net

Figure 3 presents the network structure of SynSet-Net, where “S” indicates the source imaging modality (e.g., MRI), while “T” indicates the target imaging modality (e.g., CT). The SynSeg-Net consisted of two major portions: cycle synthesis subnet and segmentation subnet.

1). Cycle Synthesis Subnet:

The 9 block ResNet (defined in [5] and [36]) was employed as the two generators G1 and G2. The generator G1 transferred a real image x in modality S to a synthetic image G1 (x) in modality T, while the generator G2 synthesized a real image y in modality T to a synthetic image G2 (y) in modality S. Next the PatchGAN (defined in [5] and [37]) was used as the two adversarial discriminators D1 and D2. D1 determined whether a provided image is a synthetic image G1 (x) or a real image y, while D2 judged whether a provided image is a synthetic image G2 (y) or a real image x. When deploying such network on unpaired images from modality S and T, two forward training paths (Path A and Path B) were used (in Figure 3).

2). Segmentation Subnet:

Since the final aim of the proposed SynSeg-Net was to perform end-to-end synthetic segmentation, we concatenate a segmentation network “Seg” after G1 directly, as an extension of the training Path A. To be consistent with the cycle synthesis subnet, the same the 9 block ResNet [5, 36] were used as S, whose network structure was identical to G1.

3). Loss Functions:

In SynSeg-Net, five loss functions have been used during the training stage. After discriminators D1 and D2, two adversarial loss functions were used to train the adversarial generators G1 and G2.

LGAN(G1,D1,S,T)=EyT[logD1(y)]+ExS[log(1D1(G1(x)))] (1)
LGAN(G2,D2,T,S)=ExS[logD2(x)]+EyT[log(1D2(G2(y)))] (2)

Meanwhile, two cycle consistent loss functions were used to minimize the difference between true images and cycle reconstructed images.

Lcycle(G1,G2,S)=ExA[G2(G1(x))x1] (3)
Lcycle(G2,G1,T)=EyB[G1(G2(y))y1] (4)

The last loss function is the segmentation loss, which was the weighted cross entropy loss.

Lseg(Seg,G1,S)=imilog(Seg(G1(xi))) (5)

After defining five loss functions, we added them together by assigning different weights.

Ltotal=λ1LGAN(G1,D1,S,T)+λ2LGAN(G2,D2,T,S)+λ3Lcycle(G1,G2,S)+λ4Lcyle(G2,G1,T)+λ5Lseg(Seg,G1,S) (6)

C. Training and Testing

In all experiments, the lambdas were empirically set to λ1 = 1, λ2 = 1, λ3 = 10, λ4 = 10, λ5 = 1. The λ1 to λ4 were chosen using the same values in the original CycleGAN paper [5], where λ5 was simply assigned it to 1 without tuning for different applications in this study. The Adam optimizer [5] was used to minimize the Ltotal. The number of input and output channels of all networks are all one except S, which had seven output channels. The Adam learning rate was 0.0001 for G1, G2 and Seg and 0.0002 for D1 and D2.

In testing stage, only the segmentation network Seg was employed by SynSeg-Net (Figure 3). To segment a testing scan in the target modality, the testing scan was normalized to 0-1. Next, the axial slices from normalized testing image volume were resampled to 256 × 256 using bilinear interpolation. Last, the final segmentation slices were resampled to the original resolution using nearest neighbor interpolation and were concatenated. During training, the 2D slices were sampled randomly across all scans without forcing each batch to have only consecutive slices or only from the same subject.

The experiments were performed on an Ubuntu workstation, with NVIDIA Titan GPU (12 GB memory) and CUDA 8.0. The code of preprocessing and processing was implemented in MATLAB 2016a (www.mathworks.com), while the code of SynSeg-Net methods was implemented in Python 2.7 (www.python.org). For DCNN methods, the PyTorch 0.2 version (www.pytorch.org) was used to establish the network structures and perform training.

D. Evaluation Metrics

The Dice similarity coefficient (DSC) was employed to evaluate different approaches by comparing their segmentation results against the ground truth voxel-by-voxel. Differences between methods were evaluated by Wilcoxon signed rank test [38] with a significance threshold of p<0.05.

IV. Experimental Design and Results

We conducted experiments on two different applications to evaluate the relative effectiveness of different approaches. The first application is the MRI to CT splenomegaly synthetic segmentation. The second application is the CT to MRI TICV synthetic segmentation. In the first experiment, we first employed the target abdominal CT intensity images in the training (without using the manual labels), which would provide the best synthetic segmentation performance since the target intensity images were used in the synthesis learning. Then, we use an independent CT cohort for validation.

A. MRI-to-CT Splenomegaly Synthetic Segmentation for Abdomen

1). Data:

A collection of 60 clinical acquired whole abdomen MRI T2w scans as well as 19 clinical acquired whole abdomen CT scans from splenomegaly patients were used as the training and testing data. The MRI and CT scans were acquired in the axial plane. In total, 3262 MRI slices and 1874 CT slices were used in the experiments.

2). Experimental Design:

CT Segmentation with CT Manual Labels

First, our previously developed spleen segmentation network (SSNet) [31] (trained by 75 normal spleen CT scan) was employed to assess performance of a network trained by normal spleens applied to splenomegaly scans.

Then, multi-atlas segmentation and residual FCN network were used as two baseline methods, which used traditional segmentation strategies: trained by 19 splenomegaly CT scans as well as the corresponding manual spleen labels in a leave-one-out cross validation manner. Briefly, the adaptive Gaussian mixture model multi-atlas segmentation (AGMM MAS) was used as the first baseline method, which has been shown its superior performance on splenomegaly segmentation [39]. The second baseline approach employed the 9 block ResNet FCN [5, 36]. To compare with the synthetic segmentation methods, the network structure and the hyperparameters of the ResNet were kept exactly the same as the generators and segmentation networks in SynSeg-Net. This method evaluated the performance of traditional supervised DCNN segmentation, which used the manual labels in target imaging during training. Since only spleen manual labels were available in the CT domain, the supervised learning methods using CT manual labels provided spleen segmentation results.

CT Segmentation Without CT Manual Labels

Then, we evaluated the performance of synthetic segmentation, which did not use the manual labels in target imaging modality during training. In this section, the two stage CycleGAN+Seg. strategy proposed by Chartsias et al. [35] as well as the proposed end-to-end SynSeg-Net were evaluated. To be a fair comparison, the network structures of CycleGAN+Seg. and SynSeg-Net were the same except that the SynSeg-Net, which connected the synthesis and segmentation in an end-to-end training. Briefly, the CycleGAN+Seg. strategy firstly trained the CycleGAN network to achieve 60 synthetic CT scans from 60 real MRI scans. Then the manual labels from real MRI scans as well as the corresponding synthetic CT scans were used to train an independent 9 block ResNet network. Hence, two independent training phrases were used.

By contrast, the proposed SynSeg-Net integrated the two synthesis and segmentation training phrases into an end-to-end training framework. The examples of real, synthesized, reconstructed and segmentation images for Path A and Path B were shown in Figure 4.

Fig. 4.

Fig. 4.

The intermediate results of the real, synthesized, and reconstructed images as well as segmentations in training Path A and Path B.

We also performed an experiment that trained the SynSeg-Net only using the source to target path (from MRI to CT) without the target to source path. The experiment only used G1 and T in the half cycle (HC), which was called SynSeg-Net-HC. This experiment presented the segmentation performance with/without the complete cycle.

All networks were trained and validated for 100 epochs. The epoch with highest mean DSC between predicted and manual segmentation on 19 splenomegaly CT scans were reported in the results. The best performance of ResNet (epoch=90) was obtained from leave-one-subject-out validation. The best performance of SSNet (epoch=10), SynSeg-Net-HC (epoch=10), CycleGAN+Seg. (epoch=50) and SynSeg-Net (epoch=40) were evaluated from the external validation since labels for 19 splenomegaly CT scans were never used in the training. Since liver, left kidney, right kidney and stomach manual labels were avilable in additon to spleen labels in MRI, the corresponding automatic organ segmentation results were also presented qualitatively in Figure 5 for SysSeg-Net-HC, CycleGAN+Seg, and SynSeg-Net. However, we did not evaluate the results except spleen since (1) we did not have manual labels for the remaining organs in CT domain, (2) the purpose of this experiment is to perform spleen segmetnation.

Fig. 5.

Fig. 5.

The qualitative results were presented in this figure, including (1) three canonical methods using CT manual labels in CT segmentation, and (2) CycleGAN+Seg. and the proposed SynSeg-Net methods without using CT manual labels. The splenomegaly CT labels were only used in validation and excluded from training for (2). Moreover, later methods not only performed spleen segmentation but also estimated labels for other organs, which were not provided by canonical methods when such labels were not available on CT.

3). Results:

The qualitative and quantitative results were shown in Figure 5 and 6 respectively. Three subjects with largest, median and smallest DSC of SynSeg-Net were presented. From the results, the SynSeg-Net was not only able to perform the spleen segmentation, but also estimated segmentations on liver, left kidney, right kidney and stomach. The “*” indicates the difference between methods were significant, while “N.S.” means not significant. Average surface distance (ASD) measurements (median, mean, and standard deviation (Std)) were presented as well as the DSC measurements in Table 1.

Fig. 6.

Fig. 6.

The boxplot results of all CT splenomegaly testing images, where “*” means the difference are significant at p<0.05, while “N.S.” means not significant.

TABLE I.

Dice similarity score (DSC) and average surface distance (ASD) for CT splenomegaly testing images.

SSNet AGMM MAS Seg. SynSeg-Net-HC CycleGAN+Seg. SynSeg-Net
Median DSC 0.679 0.912 0.911 0.628 0.880 0.919
Mean±Std DSC 0.630±0.269 0.861±0.101 0.911±0.040 0.605±0.084 0.878±0.056 0.895±0.063
Median ASD 8.882 3.164 2.005 15.181 5.835 2.864
Mean±Std ASD 18.340±27.991 6.726±7.710 3.004±2.797 14.383±4.521 5.600±3.619 3.898±3.397
*

the unit for ASD related measurements is millimeter (mm).

Without using CT labels, the SynSeg-Net achieved significant superior performance compared with CycleGAN+Seg. and SynSeg-Net-HC methods, while achieving comparable performance with baseline ResNet segmentation network using CT labels.

B. External Validation for MRI-to-CT Splenomegaly Synthetic Segmentation

In the previous experiment, the target images were used in the training stages (only intensity images were used and the labels were excluded). This strategy would provide the best performance of the proposed SynSeg-Net since the target images were used to model the target distributions. However, the training stage needs to be performed again for an unseen target image. A more general strategy is to apply the trained model on new target images directly. in this experiment, we employed an independent external validation cohort to evaluate the performance of the baseline segmentation network, two stages CycleGAN+Seg. network, and proposed SynSeg-Net.

1). Data:

A set of 66 whole abdomen CT scans from independent study were used as the external validation data to evaluate the performance of different method. Among the entire cohorts, 23 scans were axial acquisition, 21 scans were coronal acquisition, while 22 scans were sagittal acquisition (in Figure 7).

Fig. 7.

Fig. 7.

The qualitative results were presented in this figure. Three rows indicated the three types of scans in the external validation cohort: (1) axial acquisition, (2) coronal acquisition, and (3) sagittal acquisition. The results of the corresponding acquisition view were presented in the left panels, while the results for remaining views were showed in the right panels.

2). Experimental Design:

The same preprocessing was performed on the 66 CT scans before segmentation. Then, the trained baseline ResNet segmentation network (19 CT + CT labels), the trained Cycle+Seg. and SynSeg-Net (60 MRI and 19 CT + MRI labels) from the previous experiment were applied to segment 66 external validation CT scans directly. The parameters and the presented epochs were the same as the previous experiment without additional training or fine-tuning.

3). Results:

The qualitative and quantitative results have been showed in Figure 7 and 8 respectively. The “*” indicates the difference between methods were significant, while “N.S.” means not significant.

Fig. 8.

Fig. 8.

The boxplot results of all CT splenomegaly external validation images, where “*” means the difference are significant at p<0.05, while “N.S.” means not significant. Seg. was the ResNet segmentation network using manual labels in CT. Without using CT manual labels, CycleGAN+Seg. and SynSeg-Net were the two-stage and end-to-end network designs respectively.

In Figure 8, the boxplots were presented to compare the segmentation results. Without using CT labels, the SynSeg-Net achieved significant superior performance compared with CycleGAN+Seg. method, while achieved comparable performance with baseline ResNet segmentation network using CT labels for sagittal acquisition scans. The corresponding ASD measurements were presented with the DSC measurements in Table 2.

TABLE II.

Dice similarity score (DSC) and average surface distance (ASD) for all CT splenomegaly external validation images.

Seg. Cycle+Seg. SynSeg-Net
Axial Acquisition
Median DSC 0.937 0.894 0.925
Mean±Std DSC 0.929±0.025 0.884±0.057 0.906±0.058
Median ASD 2.073 3.698 2.696
Mean±Std ASD 2.276±1.005 4.074±2.216 3.157±1.804
Coronal Acquisition
Median DSC 0.921 0.883 0.906
Mean±Std DSC 0.914±0.031 0.876±0.037 0.882±0.087
Median ASD 1.840 3.508 2.125
Mean±Std ASD 2.116±0.772 4.281±3.124 2.939±2.127
Sagittal Acquisition
Median DSC 0.902 0.877 0.893
Mean±Std DSC 0.863±0.166 0.850±0.066 0.872±0.064
Median ASD 2.308 3.863 2.932
Mean±Std ASD 3.934±5.733 5.083±3.294 4.043±2.905
*

the unit for ASD related measurements is millimeter (mm).

C. CT-to-MRI TICV Synthetic Segmentation for Brain

The previous two experiments evaluated the performance of MRI to CT synthetic segmentation for an abdominal organ. Therefore, the synthesis was performed to learn a less context imaging modality (abdominal CT) from a richer context imaging modality (abdominal MRI). In this experiment, we evaluate the performance of CT to MRI synthetic segmentation for the brain, which is a more challenging task since brain MRI has much richer tissue context than brain CT. Therefore, this experiment evaluated the performance of synthetic segmentation on a richer context imaging modality (brain MRI), whose training images were synthesized from a less context imaging modality (brain CT).

1). Data:

20 subjects with both whole brain and MRI and CT were used in this experiment. True TICV labels were available for all 20 subjects, whose imaging parameters and the atlas generation were described in [40]. To evaluate the SynSeg-Net performance, we separate 20 subjects to two groups, group CT and group MRI. In group CT, the first half (10 subjects) were used, in which we supposed only CT scans were available. In group MRI, the remaining half (10 subjects) were used, in which we supposed only MRI scans were available. Therefore, we have two independent unpaired training data: 10 with CT as the source modality and 10 with MRI as the target modality. Since we aim to train an MRI TICV segmentation network, we only use the true TICV labels for 10 CT scans in the training. The true TICV labels for 10 MRI scans were excluded from training and only used during validation.

2). Experimental Design:

Unpaired 10 CT scans and 10 MRI scans as well as the TICV true labels on CT were used to train the SynSeg-Net. After training, an MRI TICV segmentation network was achieved without using any TICV labels on MRI. The preprocessing steps are the same as the abdomen scans. The same baseline segmentation network, Cycle+Seg. network, and the proposed SynSeg-Net were employed in the validation. The hyperparameters for training all deep networks were kept the same as the experiments for abdomen scans. The examples of real, synthesized, reconstructed images and segmentation images for Path A and Path B were shown in Figure 9. Since the external validation data were not available for MRI TICV segmentation, all the methods were compared at epoch 200 consistently.

Fig. 9.

Fig. 9.

The intermediate results of the real, synthesized, and reconstructed images as well as segmentations in training Path A and Path B.

3). Results:

The qualitative and quantitative results have been shown in Figure 10 and 11 respectively. The “*” indicates the difference between methods were significant, while “N.S.” means not significant.

Fig. 10.

Fig. 10.

The qualitative results were presented in this figure. Three rows indicated three subjects with largest, median and lowest DSC of SynSeg-Net were presented.

Fig. 11.

Fig. 11.

The boxplot results of all MRI TICV testing images, where “*” means the difference are significant at p<0.05, while “N.S.” means not significant. Seg. was the ResNet segmentation network using manual TICV labels in MRI. Without using MRI manual labels, CycleGAN+Seg. and SynSeg-Net were the two-stage and end-to-end network designs respectively.

In Figure 11, the boxplots were presented to compare the segmentation results. Without using MRI TICV labels, the SynSeg-Net achieved significant superior performance compared with the CycleGAN+Seg. method, while yielded inferior performance compared with baseline ResNet segmentation network using MRI TICV labels. The corresponding ASD measurements were presented with the DSC measurements in Table 3.

TABLE III.

Dice similarity score (DSC) and average surface distance (ASD) for MRI TICV testing images

Seg. Cycle+Seg. SynSeg-Net
Median DSC 0.982 0.952 0.966
Mean±Std DSC 0.979±0.006 0.952±0.010 0.963±0.008
Median ASD 0.803 2.118 1.312
Mean±Std ASD 0.987±0.458 2.322±0.976 1.441±0.318
*

the unit for ASD related measurements is millimeter (mm).

V. Conclusion and Discussion

SynSeg-Net enables training of a deep convolutional segmentation network without having ground truth labels in the target modality. Figure 6 showed that the SSNet trained by normal spleen CT images was significantly worse than other methods. The proposed SynSeg-Net method was significantly better than the two stages CycleGAN+Seg. method. Without using CT labels, the SynSeg-Net achieved the comparable performance as the AGMM MAS and ResNet that used CT labels. on the contrary, the performance of CycleGAN+Seg. was significantly worse than ResNet. Then, without including target intensity image during training, the proposed SynSeg-Net approach resulted in significantly superior performance compared with CycleGAN+Seg. method, while achieved comparable performance with baseline ResNet segmentation network using CT labels for sagittal acquisition scans (Figure 8). Last, for the CT to MRI TICV synthetic segmentation, the proposed SynSeg-Net approach resulted in significantly superior performance compared with CycleGAN+Seg. method (Figure 11).

one major limitation of deep learning based segmentation is the limited generalization ability across applications, domains and imaging modalities. Figure 6 showed that the SSNet trained on normal spleen did not provide decent segmentation performance on splenomegaly even when both were for the same organ within the same imaging modality. one solution is to label a set of images for the new application, however, the manual tracing is resource intensive. Therefore, it is appealing to reuse the previous manually labeled images from another modality as the proposed SynSeg-Net. Moreover, the images from source and target domains were not limited to paired ones. To be potentially compatible with other applications other than spleen segmentation, we employed the ResNet instead of SSNet as the generators and segmentation subnet since the ResNet is one of the most widely used network across a variety of medical image processing tasks. From Table 1, the proposed SynSeg-Net without using splenomegaly CT manual labels achieved superior performance compared with Cycle+Seg. method, while had comparable performance as ResNet using splenomegaly CT manual labels. From Table 3, although inferior than training directly using target modality manual labels, the performance of SynSeg-Net was better than Cycle+Seg. for TICV segmentation.

In SynSeg-Net, 9 block ResNet was used as the generators and the segmentation network as it was validated in the original CycleGAN paper [5]. Meanwhile, PatchGAN was used as the discriminators. While this combination is successful, we do not claim optimality of using ResNet or PatchGAN. Using other image to image generators, segmentation networks, or discriminators might yield better performance when performing synthetic segmentation. Since the proposed method is an open framework, users are encouraged to explore methods other than the components that used in this study. In the future, it would be also interesting to evaluate the inter-rater reliability by including the manual segmentations from different human raters. Such results will also provide the comparison between automatic methods and human experts.

In SynSeg-Net, a 2D network was used to perform synthesis and segmentation since the numbers of training images for both experiments were not large enough to train a reasonable 3D network. However, the proposed SynSeg-Net is able to be extended to 3D framework and we hypothesize that it would yield better 3D segmentation performance if large number of training scans are available. In fact, Zhang et al. [3] have showed the promising results when performing 3D synthesis and segmentation. The 2D slice wise nearest neighbor interpolation was used to resample a manual segmentation volume to be compatible with the network in this study. However, such step can be replaced by other resampling methods (e.g., registration-based method), which might yield better accuracy for larger numbers of output labels. During training, the 2D slices were sampled randomly across all scans. Therefore, the scans with more slices have more data during training, which would introduce bias due to the unbalanced sampling. Such bias can be alleviated by designing balanced sampling strategy (e.g., perform interpolations on raw scan) in the future.

The quantitative evaluation on “how good the synthesized images are?” is still an open question in our community, especially when the synthesized images were used as intermediate data for segmentation. A recent study even suggested that certain synthesized images should not be used for direct interpretation since the synthesized images may lead to misdiagnosis when transferring the distributions [41]. Therefore, we focus on developing a new strategy of image segmentation rather than providing a new image synthesis method. As a result, the quantitative evaluations in this paper were on the segmentation rather than on the synthesis. Moreover, the unpaired images were used in this study without using the multi-modal images from the same patient, which also limited the validation on synthesis. The proposed end-to-end framework might benefit the synthesis compared with the two-stage design. For instance, the intermediate results in the end-to-end framework might provide “better” intermediate representations for segmentation. Thus, the final segmentation performance was used as the metrics across different segmentation tasks to compare the proposed method with the two-stage method. In summary, the proposed end-to-end method achieved consistent superior segmentation performance compared with the two-stage design across different tasks.

Acknowledgment

The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

This work was supported in part by the Intramural Research Program, National Institute on Aging, NIH, in part by NIH under Grants 5R01NS056307 and 5R21NS082891, in part by the Advanced Computing Center for Research and Education (ACCRE), Vanderbilt University, Nashville, TN, USA, in part by ViSE/VICTR VR3029, in part by the National Center for Research Resources under Grant UL1 RR024975-01, in part by the National Center for Advancing Translational Sciences under Grant 2 UL1 TR000445-06, in part by the NIH S10 Shared Instrumentation under Grant 1S10OD020154-01 (Smith), in part by Vanderbilt IDEAS under Grant (Holly-Bockelmann, Walker, Meliler, Palmeri, Weller), and in part by ACCRE, Vanderbilt University, through Big Data TIPs Grant. The work of M. R. Savona, R. G. Abramson, and B. A. Landman was supported by Vanderbilt-Incyte Research Alliance Grant. The work of R. G. Abramson and B. A. Landman was supported by Incyte Corporation. The work of B. A. Landman was supported in part by NSF under Grant CAREER 1452485, in part by NIH under Grants 5R21EY024036, R01EB017230, 1R21NS064534 (Prince), 1R01NS070906 (Pham), 2R01EB006136 (Dawant), 1R03EB012461, and R01NS095291 (Dawant), and in part by the NCI Cancer Center Support under Grant P30 CA068485.

Footnotes

Contributor Information

Yuankai Huo, Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37235 USA.

Zhoubing Xu, Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37235 USA.

Hyeonsoo Moon, Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37235 USA.

Shunxing Bao, Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37235 USA.

Albert Assad, Incyte Corporation, Wilmington, DE 19803 USA..

Tamara K. Moyo, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37235 USA.

Michael R. Savona, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37235 USA.

Richard G. Abramson, Department of Radiology and Radiological Science, Vanderbilt University Medical Center, Nashville, TN 37235 USA.

Bennett A. Landman, Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37235 USA

References

  • [1].Frangi AF, Tsaftaris SA, and Prince JL, "Simulation and Synthesis in Medical Imaging," IEEE Transactions on Medical Imaging, vol. 37, pp. 673–679, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Miller MI, Christensen GE, Amit Y, and Grenander U, "Mathematical textbook of deformable neuroanatomies," Proceedings of the National Academy of Sciences, vol. 90, pp. 11944–11948, 1993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Zhang Z, Yang L, and Zheng Y, "Translating and Segmenting Multimodal Medical Volumes with Cycle-and Shape-Consistency Generative Adversarial Network," arXiv preprint arXiv:1802.09655,2018. [Google Scholar]
  • [4].Iglesias JE, Konukoglu E, Zikic D, Glocker B, Van Leemput K, and Fischl B, "Is synthesizing MRI contrast useful for inter-modality analysis?," in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2013, pp. 631–638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Zhu J-Y, Park T, Isola P, and Efros AA, "Unpaired image-to-image translation using cycle-consistent adversarial networks," arXiv preprint arXiv:1703.10593,2017. [Google Scholar]
  • [6].Huo Y, Xu Z, Bao S, Assad A, Abramson RG, and Landman BA, "Adversarial synthesis learning enables segmentation without target modality ground truth," in Biomedical Imaging (ISBI 2018), 2018 IEEE 15th International Symposium on, 2018, pp. 1217–1220. [Google Scholar]
  • [7].Burgos N, Cardoso MJ, Thielemans K, Modat M, Pedemonte S, Dickson J, et al. , "Attenuation correction synthesis for hybrid PET-MR scanners: application to brain studies," IEEE transactions on medical imaging, vol. 33, pp. 2332–2341, 2014. [DOI] [PubMed] [Google Scholar]
  • [8].Cardoso MJ, Sudre CH, Modat M, and Ourselin S, "Template-based multimodal joint generative model of brain data," in International Conference on Information Processing in Medical Imaging, 2015, pp. 17–29. [DOI] [PubMed] [Google Scholar]
  • [9].Hertzmann A, Jacobs CE, Oliver N, Curless B, and Salesin DH, "Image analogies," in Proceedings of the 28th annual conference on Computer graphics and interactive techniques, 2001, pp. 327–340. [Google Scholar]
  • [10].Ye DH, Zikic D, Glocker B, Criminisi A, and Konukoglu E, "Modality propagation: coherent synthesis of subject-specific scans with data-driven regularization," in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2013, pp. 606–613. [DOI] [PubMed] [Google Scholar]
  • [11].Roy S, Carass A, and Prince JL, "Magnetic resonance image example-based contrast synthesis," IEEE transactions on medical imaging, vol. 32, pp. 2348–2363, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Roy S, Carass A, and Prince J, "A compressed sensing approach for MR tissue contrast synthesis," in Biennial International Conference on Information Processing in Medical Imaging, 2011, pp. 371–383. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Jog A, Carass A, Roy S, Pham DL, and Prince JL, "Random forest regression for magnetic resonance image synthesis," Medical image analysis, vol. 35, pp. 475–488, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Vemulapalli R, Van Nguyen H, and Kevin Zhou S, "Unsupervised cross-modal synthesis of subject-specific scans," in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 630–638. [Google Scholar]
  • [15].Huang Y, Beltrachini L, Shao L, and Frangi AF, "Geometry regularized joint dictionary learning for cross-modality image synthesis in magnetic resonance imaging," in International Workshop on Simulation and Synthesis in Medical Imaging, 2016, pp. 118–126. [Google Scholar]
  • [16].Huang Y, Shao L, and Frangi AF, "Simultaneous super-resolution and cross-modality synthesis of 3D medical images using weakly-supervised joint convolutional sparse coding," arXiv preprint arXiv:1705.02596, 2017. [Google Scholar]
  • [17].Van Nguyen H, Zhou K, and Vemulapalli R, "Cross-domain synthesis of medical images using efficient location-sensitive deep network," in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2015, pp. 677–684. [Google Scholar]
  • [18].Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, et al. , "A survey on deep learning in medical image analysis," Medical image analysis, vol. 42, pp. 60–88, 2017. [DOI] [PubMed] [Google Scholar]
  • [19].Xiang L, Wang Q, Nie D, Zhang L, Jin X, Qiao Y, et al. , "Deep embedding convolutional neural network for synthesizing CT image from T1-Weighted MR image," Medical image analysis, vol. 47, pp. 31–44, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Nie D, Trullo R, Lian J, Petitjean C, Ruan S, Wang Q, et al. "Medical image synthesis with context-aware generative adversarial networks," in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2017, pp. 417–425. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. , "Generative adversarial nets," in Advances in neural information processing systems, 2014, pp. 2672–2680. [Google Scholar]
  • [22].Isola P, Zhu J-Y, Zhou T, and Efros AA, "Image-to-image translation with conditional adversarial networks," arXiv preprint, 2017. [Google Scholar]
  • [23].Shrivastava A, Pfister T, Tuzel O, Susskind J, Wang W, and Webb R, "Learning from simulated and unsupervised images through adversarial training," in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, p. 6. [Google Scholar]
  • [24].Bousmalis K, Silberman N, Dohan D, Erhan D, and Krishnan D, "Unsupervised pixel-level domain adaptation with generative adversarial networks," in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, p. 7. [Google Scholar]
  • [25].Osokin A, Chessel A, Salas REC, and Vaggi F, "Gans for biological image synthesis," in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2252–2261. [Google Scholar]
  • [26].Costa P, Galdran A, Meyer MI, Abràmoff MD, Niemeijer M, Mendonça AM, et al. , "Towards adversarial retinal image synthesis," arXiv preprint arXiv:1701.08974, 2017. [DOI] [PubMed] [Google Scholar]
  • [27].Roth HR, Lu L, Lay N, Harrison AP, Farag A, Sohn A, et al. , "Spatial aggregation of holistically-nested convolutional neural networks for automated pancreas localization and segmentation," Medical image analysis, vol. 45, pp. 94–107, 2018. [DOI] [PubMed] [Google Scholar]
  • [28].Kohl S, Bonekamp D, Schlemmer H-P, Yaqubi K, Hohenfellner M, Hadaschik B, et al. , "Adversarial Networks for the Detection of Aggressive Prostate Cancer," arXiv preprint arXiv:1702.08014, 2017. [Google Scholar]
  • [29].Xue Y, Xu T, Zhang H, Long R, and Huang X, "SegAN: Adversarial Network with Multi-scale L1 Loss for Medical Image Segmentation," arXiv preprint arXiv:1706.01805, 2017. [DOI] [PubMed] [Google Scholar]
  • [30].Yang D, Xu D, Zhou SK, Georgescu B, Chen M, Grbic S, et al. , "Automatic liver segmentation using an adversarial image-to-image network," in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2017, pp. 507–515. [Google Scholar]
  • [31].Huo Y, Xu Z, Bao S, Bermudez C, Plassard AJ, Liu J, et al. , "Splenomegaly segmentation using global convolutional kernels and conditional generative adversarial networks," in Medical Imaging 2018: Image Processing, 2018, p. 1057409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32].Kamnitsas K, Baumgartner C, Ledig C, Newcombe V, Simpson J, Kane A, et al. , "Unsupervised domain adaptation in brain lesion segmentation with adversarial networks," in International Conference on Information Processing in Medical Imaging, 2017, pp. 597–609. [Google Scholar]
  • [33].Wolterink JM, Dinkla AM, Savenije MH, Seevinck PR, van den Berg CA, and Išgum I, "Deep MR to CT Synthesis Using Unpaired Data," in International Workshop on Simulation and Synthesis in Medical Imaging, 2017, pp. 14–23. [Google Scholar]
  • [34].Zhao C, Carass A, Lee J, Jog A, and Prince JL, "A Supervoxel Based Random Forest Synthesis Framework for Bidirectional MR/CT Synthesis," in International Workshop on Simulation and Synthesis in Medical Imaging, 2017, pp. 33–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [35].Chartsias A, Joyce T, Dharmakumar R, and Tsaftaris SA, "Adversarial Image Synthesis for Unpaired Multi-modal Cardiac Data," in International Workshop on Simulation and Synthesis in Medical Imaging, 2017, pp. 3–13. [Google Scholar]
  • [36].Johnson J, Alahi A, and Fei-Fei L, "Perceptual losses for real-time style transfer and super-resolution," in European Conference on Computer Vision, 2016, pp. 694–711. [Google Scholar]
  • [37].Isola P, Zhu J-Y, Zhou T, and Efros AA, "Image-to-image translation with conditional adversarial networks," arXiv preprint arXiv:1611.07004, 2016. [Google Scholar]
  • [38].Wilcoxon F, "Individual comparisons by ranking methods," Biometrics bulletin,pp. 80–83, 1945. [Google Scholar]
  • [39].Liu J, Huo Y, Xu Z, Assad A, Abramson RG, and Landman BA, "Multi-atlas spleen segmentation on CT using adaptive context learning," in Medical Imaging 2017: Image Processing, 2017, p. 1013309. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [40].Huo Y, Asman AJ, Plassard AJ, and Landman BA, "Simultaneous total intracranial volume and posterior fossa volume estimation using multiatlas label fusion," Human brain mapping, vol. 38, pp. 599–616, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [41].Cohen JP, Luck M, and Honari S, "Distribution Matching Losses Can Hallucinate Features in Medical Image Translation," arXiv preprint arXiv:1805.08841, 2018. [Google Scholar]

RESOURCES